# Unravelling genetic basis of comorbidity using EHR-linked biobank data

> **NIH NIH R01** · UNIVERSITY OF PENNSYLVANIA · 2022 · $474,964

## Abstract

Rapid progress in translational bioinformatics and clinical informatics for precision medicine has 
provided many computing and informatics methodologies to provide better prediction, diagnosis and 
treatment strategy as a clinical utility. In particular, high dimensional and large-scale 
biomedical data sets, ranging from clinical data to ‘omics data, provide an unprecedented 
opportunity for translating the newly found knowledge from biomedical big data analytics to 
support clinical decisions. The complexity and scale of these big data sets hold great 
promise, yet present substantial challenges. As one of important concerns for clinicians, 
comorbidity is a well- documented phenomenon in medicine in which one or more medical conditions 
exist and potentially interact with one another, thereby influencing the primary clinical 
condition. Several studies show variability in the number of comorbid conditions that can 
exist at one time, and patterns of disease presentation differ from one chronic condition to 
another. Thus, there is a clear need to improve care for individuals with multiple 
comorbidities, but doing so requires a much more detailed understanding of the trends of disease 
associations than we currently possess. Previous studies have primarily focused on a 
handful of specific comorbidities; investigating the underlying causes of broad disease 
comorbidity across the human diseasome has been challenging. Fortunately, in the past 
decade, comprehensive collections of disease diagnosis data have become available, primarily 
in the form of data from electronic health records (EHRs). Retrospectively, we can use a patient’s 
health history to identify comorbidities and apply a data-driven approach to studying disease 
comorbidity patterns that considers all possible disease comorbidities. In particular, developing 
computing and modeling of large-scale data that integrates newly defined comorbidity patterns with 
genomics will hold great potential for uncovering molecular mechanisms of disease. Primarily, we 
will elucidate the underlying genetic and non-genetic factors that influence disease comorbidity. 
We will apply two orthogonal approaches to identify comorbidities: 1) deriving from disease 
co-occurrence using EHR data alone, and 2) deriving from pleiotropic genetic associations using the 
EHR-linked biobank dataset. Network-based approaches have the potential to uncover unexpected 
relationships between diseases. One of the most significant advantages of our proposal is the 
linking of a single-source EHR to genomic data; this provides the opportunity to 
revisit individual-level genotype and phenotype data for the design of more targeted 
studies and to ask more specific questions. Additionally, our results can be used to develop 
a novel comorbidity risk score that combines both clinical data and genetic effects, which might 
constitute a new tool for clinical prevention and monitoring. These goals are very much in keeping 
...

## Key facts

- **NIH application ID:** 10460229
- **Project number:** 5R01GM138597-03
- **Recipient organization:** UNIVERSITY OF PENNSYLVANIA
- **Principal Investigator:** Dokyoon Kim
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $474,964
- **Award type:** 5
- **Project period:** 2020-08-01 → 2024-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10460229

## Citation

> US National Institutes of Health, RePORTER application 10460229, Unravelling genetic basis of comorbidity using EHR-linked biobank data (5R01GM138597-03). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10460229. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
