# Statistical Methods for Addressing Disease Under-diagnosis Using Electronic Health Record Data

> **NIH NIH R01** · UNIVERSITY OF PENNSYLVANIA · 2024 · $619,057

## Abstract

Under-diagnosis occurs when an individual living with a disease condition has not
received a diagnosis. Reasons for under-diagnosis are often complex and context
specific, and the extent may vary across sensible population subgroups leading to
disparity in care. Electronic Health Records (EHRs) contain a wealth of health
information for patients, and the diagnosed and under-diagnosed patients may bear
similarity in their EHR profiles, which differ from those condition-free. Therefore,
EHRs provide a unique opportunity to address under-diagnosis in the standard
healthcare setting. Full exploitation of such opportunity is challenging, however,
because of the very fact that under-diagnosed patients are embedded in the large
number of condition-free patients. Noting that patients who have been diagnosed
with the condition can be identified from EHRs, we propose that EHR data, when
enriched with additional disease labels from a small scale targeted screening, allows
development of data-driven approaches to identifying under-diagnosed patients and
assessing disparity in under-diagnosis. To this end, we will develop an arsenal of
statistical and machine learning methods and accompanying software tools to
address under-diagnosis. Our methods enable (1) a risk-based approach to
identifying patients in EHRs who most possibly miss the diagnosis (Aim 1); (2)
unbiased comparison between diagnosed and under-diagnosed patients to
understand disparity in under-diagnosis (Aim 2); and (3) leveraging of existing
models and targeted screening data to address under-diagnosis in a new clinical
setting. We will apply the developed methods to address under-diagnosis in Primary
Aldosteronism and Familial Hypercholesterolemia using data from Penn Medicine
and VA EHRs.

## Key facts

- **NIH application ID:** 10779887
- **Project number:** 1R01LM014401-01
- **Recipient organization:** UNIVERSITY OF PENNSYLVANIA
- **Principal Investigator:** Jinbo Chen
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $619,057
- **Award type:** 1
- **Project period:** 2024-09-04 → 2029-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10779887

## Citation

> US National Institutes of Health, RePORTER application 10779887, Statistical Methods for Addressing Disease Under-diagnosis Using Electronic Health Record Data (1R01LM014401-01). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10779887. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
