Informatics Methods for Leveraging Clinical Data Sources to Study Risk Factors for Alzheimer's Disease

NIH RePORTER · NIH · R21 · $454,731 · view on reporter.nih.gov ↗

Abstract

Project Summary Clinical data sources such as Electronic Health Records (EHR) and medical claims data have the potential to serve as an enormous research resource to support goals such as identifying modifiable risk factors for Alzheimer’s Disease (AD), but clinical data have significant limitations including data quality and missing data challenges. To address these limitations, there is increasing interest in linking clinical data with research study data. Connecting these data sources promises to synergize research-quality outcome measures based on neuropathological data with rich information on potentially modifiable AD risk factors present in mid-life such as co-morbid conditions and medication exposures that can be derived from clinical data. However, integrating heterogeneous, inconsistently measured data types from a large clinical database (e.g., diagnosis codes, prescription medications, imaging) with more consistently measured data on a smaller, targeted study population requires development of novel informatics methodologies. Recently proposed deep learning approaches have the potential to flexibly account for the complex data availability patterns encountered in clinical data with improved predictive accuracy relative to traditional methods. However, statistical properties of downstream analyses, such as bias and variance, have not yet been evaluated following the application of these methods. Moreover, specialized methods are needed to impute complex data types such as high-dimensional neuroimaging data. In addition to addressing missingness within a clinical data-derived dataset, methods are needed to facilitate combining information from clinical databases and research study data. In this research, we propose to address the challenges of integrating clinical and research databases by harnessing deep learning and transfer learning. We will use neuroimaging data from the Alzheimer’s Disease Neuroimaging Initiative and cohort study data from the Adult Changes in Thought study in combination with clinical data from the Kaiser Permanente Washington EHR to develop novel informatics approaches to identification of risk factors for AD. In Aim 1, we will develop a deep learning approach to data integration for heterogeneous clinical data linked to research study data, accounting for complex missing data patterns encountered in clinical data. In Aim 2, we will develop a data integration framework to support transfer learning from clinical data to research study data to advance statistical inference about risk factors for AD. The long- term goal of our research program is to accelerate research on AD by facilitating integration of heterogeneously collected clinical data and research study data to capitalize on the unique strengths of each data source.

Key facts

NIH application ID
10352791
Project number
1R21AG075574-01
Recipient
UNIVERSITY OF PENNSYLVANIA
Principal Investigator
Rebecca Hubbard
Activity code
R21
Funding institute
NIH
Fiscal year
2022
Award amount
$454,731
Award type
1
Project period
2022-02-15 → 2025-01-31