# Understanding Diabetes Heterogeneity via Mining Multimodality Interconnected Data

> **NIH NIH K25** · EMORY UNIVERSITY · 2023 · $172,810

## Abstract

Understanding Diabetes Heterogeneity via Mining Multimodality Interconnected Data
Abstract. Diabetes is a prevalent and highly heterogeneous disease that incurs tremendous human, economic, and so-
cial costs globally. Prediabetes and early-stage type 2 diabetes often do not have single strong indicates or symptoms,
posing great challenges for early detection and intervention. Moreover, once into a later-stage, diabetic patients are at
high risk of developing various health problems such as heart disease, vision loss and kidney disease, which further com-
plicates effective healthcare and may eventually lead to consequences from blindness to amputations to limited social
interactions due to mobility. Unfortunately, current subtyping of diabetes has failed to decouple such heterogeneity.
Recent remarkable advances in biotechnology have led to a signiﬁcant production of high throughput patient data such
as electronic health records (EHRs), multi-omics, and structured surveys, providing tremendous promises to powerful
quantitative approaches towards the understanding of diabetes heterogeneity. However, existing machine learning (ML)
models ignore the higher-order interconnections among various disease variables, thus failing to differentiate complex
ﬁne-grained subtypes and extract subtle corresponding phenotypes– regarding speciﬁc combinations of disease vari-
ables. Moreover, most existing studies focus on single sources of data such as clinical, molecular or behavioral, failing to
discover integrative biomarkers towards even more effective disease detection, analysis and treatment. Often case, these
methods also rely heavily on dataset-speciﬁc feature preprocessing and fail to transfer from one cohort to another.
As a computer scientist aiming at bridging data science and diabetes management, I have developed a well-structured
training pan in this proposed K25 project, and my primary goal is to develop a high-impact and practical ML system that
can be used to perform precise detection, in-depth analysis and cost-effective treatment of diabetes. To fully decouple
the heterogeneity of diabetes from complex patient data, I propose (1) a hyper-hetero-graph data structure (H2G) to
facilitate the comprehensive representation of patients and deep identiﬁcation of diabetic characteristics regarding the
interconnections among various disease variables and (2) a specialized graph neural network model (H2GNN) capable
of modeling H2G along with a temporal component to capture the full trajectories of disease progression and a self-
clustering component to identify novel subtypes of diabetes. Leveraging the national All of Us dataset from NIH with
EHRs, genomics and surveys of 329K+ patients (42K+ diabetic), I propose to (1) apply the model to clinical data (EHRs)
towards precise early diabetes detection, (2) incorporate molecular data (genomics) towards in-depth diabetes patho-
logical analysis, and (3) further incorporate behavioral data (surveys) towards pe...

## Key facts

- **NIH application ID:** 10644701
- **Project number:** 1K25DK135913-01
- **Recipient organization:** EMORY UNIVERSITY
- **Principal Investigator:** Ji (Carl) Yang
- **Activity code:** K25 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $172,810
- **Award type:** 1
- **Project period:** 2023-05-01 → 2028-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10644701

## Citation

> US National Institutes of Health, RePORTER application 10644701, Understanding Diabetes Heterogeneity via Mining Multimodality Interconnected Data (1K25DK135913-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10644701. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
