# EHR-based vs population-based CVD risk predictions for older patients with diabetes

> **NIH NIH R01** · NEW YORK UNIVERSITY SCHOOL OF MEDICINE · 2021 · $558,359

## Abstract

Abstract
Since 2010, clinical medicine and public health have benefited from a rapid surge of clinical research on
chronic diseases using data from electronic health records (EHRs). However, while millions of patient records
are included in large EHR networks, they are not population-representative random samples, a constraint
which has restrained their utility for population health research. The non-representative nature of patients
represented in EHR data also poses a major challenge when performing cross-site validation of EHR-based
findings, as study findings tend to reflect the unique characteristics of populations served by specific health
care systems. We propose to perform an integrated secondary data analysis of three unique datasets: 1) the
Health and Retirement Survey (HRS, begun in 1992 and ongoing) that has nationally representative health
interview data for over 20 years, as well as biomarkers, physical assessment information, prescription drug
data, and claims linkages including Medicare D drug claims; 2) the New York University Langone Health EHR
data (NYU-CDRN, 2009 to now) including demographics, vitals, diagnoses, lab results, prescriptions and
procedures; 3) the New York City Clinical Data Research Network (NYC-CDRN) which is an EHR network that
comprises 20 NYC healthcare institutions, including the NYU-CDRN, with longitudinally linked data on over 12
million patient encounters under a Common Data Model; and 4) Veterans Affairs Ann Arbor Healthcare System
(VAAAHS) Corporate Data Warehouse (CDW), which provides an important complement to the NYC-CDRN
patient population when assessing our method’s reproducibility and generalizability for the rural patient
population in care. We will leverage these four datasets to support three strands of questions on EHR-based
risk predictions: 1) assessing its utility for population inference, 2) developing individualized absolute risk
predictions, and 3) assessing its reproducibility and cross-site validation. We will predict risk of subsequent
incident cardiovascular disease (CVD) in older patients (age 50 and older) with type 2 diabetes (T2DM).
Broader use of these methods will be generally applicable to other diseases outcomes. To achieve these
objectives, our study will 1) develop and validate EHR phenotyping and diagnosis time algorithms against gold
standard chart review (Aim 1); 2) assess the population-generalizability of EHR-based risk estimation models
by comparing with cohort-based risk estimation models and develop EHR bias adjustment methods for
population inference (Aim 2); 3) develop methods for EHR-based individualized absolute risk prediction (Aim 3),
and establish the developed methods via cross-site validation (Aim 4).

## Key facts

- **NIH application ID:** 10239231
- **Project number:** 5R01AG065330-02
- **Recipient organization:** NEW YORK UNIVERSITY SCHOOL OF MEDICINE
- **Principal Investigator:** Hua Judy Zhong
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $558,359
- **Award type:** 5
- **Project period:** 2020-08-15 → 2025-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10239231

## Citation

> US National Institutes of Health, RePORTER application 10239231, EHR-based vs population-based CVD risk predictions for older patients with diabetes (5R01AG065330-02). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10239231. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*