Understanding Diabetes Heterogeneity via Mining Multimodality Interconnected Data

NIH RePORTER · NIH · K25 · $171,310 · view on reporter.nih.gov ↗

Abstract

Understanding Diabetes Heterogeneity via Mining Multimodality Interconnected Data Abstract. Diabetes is a prevalent and highly heterogeneous disease that incurs tremendous human, economic, and so- cial costs globally. Prediabetes and early-stage type 2 diabetes often do not have single strong indicates or symptoms, posing great challenges for early detection and intervention. Moreover, once into a later-stage, diabetic patients are at high risk of developing various health problems such as heart disease, vision loss and kidney disease, which further com- plicates effective healthcare and may eventually lead to consequences from blindness to amputations to limited social interactions due to mobility. Unfortunately, current subtyping of diabetes has failed to decouple such heterogeneity. Recent remarkable advances in biotechnology have led to a significant production of high throughput patient data such as electronic health records (EHRs), multi-omics, and structured surveys, providing tremendous promises to powerful quantitative approaches towards the understanding of diabetes heterogeneity. However, existing machine learning (ML) models ignore the higher-order interconnections among various disease variables, thus failing to differentiate complex fine-grained subtypes and extract subtle corresponding phenotypes– regarding specific combinations of disease vari- ables. Moreover, most existing studies focus on single sources of data such as clinical, molecular or behavioral, failing to discover integrative biomarkers towards even more effective disease detection, analysis and treatment. Often case, these methods also rely heavily on dataset-specific feature preprocessing and fail to transfer from one cohort to another. As a computer scientist aiming at bridging data science and diabetes management, I have developed a well-structured training pan in this proposed K25 project, and my primary goal is to develop a high-impact and practical ML system that can be used to perform precise detection, in-depth analysis and cost-effective treatment of diabetes. To fully decouple the heterogeneity of diabetes from complex patient data, I propose (1) a hyper-hetero-graph data structure (H2G) to facilitate the comprehensive representation of patients and deep identification of diabetic characteristics regarding the interconnections among various disease variables and (2) a specialized graph neural network model (H2GNN) capable of modeling H2G along with a temporal component to capture the full trajectories of disease progression and a self- clustering component to identify novel subtypes of diabetes. Leveraging the national All of Us dataset from NIH with EHRs, genomics and surveys of 329K+ patients (42K+ diabetic), I propose to (1) apply the model to clinical data (EHRs) towards precise early diabetes detection, (2) incorporate molecular data (genomics) towards in-depth diabetes patho- logical analysis, and (3) further incorporate behavioral data (surveys) towards pe...

Key facts

NIH application ID
10817842
Project number
5K25DK135913-02
Recipient
EMORY UNIVERSITY
Principal Investigator
Ji (Carl) Yang
Activity code
K25
Funding institute
NIH
Fiscal year
2024
Award amount
$171,310
Award type
5
Project period
2023-05-01 → 2028-04-30