# Informatics Methods for Leveraging Clinical Data Sources to Study Risk Factors for Alzheimer's Disease

> **NIH NIH R21** · UNIVERSITY OF PENNSYLVANIA · 2022 · $454,731

## Abstract

Project Summary
Clinical data sources such as Electronic Health Records (EHR) and medical claims data have the potential to
serve as an enormous research resource to support goals such as identifying modifiable risk factors for
Alzheimer’s Disease (AD), but clinical data have significant limitations including data quality and missing data
challenges. To address these limitations, there is increasing interest in linking clinical data with research study
data. Connecting these data sources promises to synergize research-quality outcome measures based on
neuropathological data with rich information on potentially modifiable AD risk factors present in mid-life such as
co-morbid conditions and medication exposures that can be derived from clinical data. However, integrating
heterogeneous, inconsistently measured data types from a large clinical database (e.g., diagnosis codes,
prescription medications, imaging) with more consistently measured data on a smaller, targeted study
population requires development of novel informatics methodologies.
Recently proposed deep learning approaches have the potential to flexibly account for the complex data
availability patterns encountered in clinical data with improved predictive accuracy relative to traditional
methods. However, statistical properties of downstream analyses, such as bias and variance, have not yet
been evaluated following the application of these methods. Moreover, specialized methods are needed to
impute complex data types such as high-dimensional neuroimaging data. In addition to addressing
missingness within a clinical data-derived dataset, methods are needed to facilitate combining information from
clinical databases and research study data.
In this research, we propose to address the challenges of integrating clinical and research databases by
harnessing deep learning and transfer learning. We will use neuroimaging data from the Alzheimer’s Disease
Neuroimaging Initiative and cohort study data from the Adult Changes in Thought study in combination with
clinical data from the Kaiser Permanente Washington EHR to develop novel informatics approaches to
identification of risk factors for AD. In Aim 1, we will develop a deep learning approach to data integration for
heterogeneous clinical data linked to research study data, accounting for complex missing data patterns
encountered in clinical data. In Aim 2, we will develop a data integration framework to support transfer learning
from clinical data to research study data to advance statistical inference about risk factors for AD. The long-
term goal of our research program is to accelerate research on AD by facilitating integration of
heterogeneously collected clinical data and research study data to capitalize on the unique strengths of each
data source.

## Key facts

- **NIH application ID:** 10352791
- **Project number:** 1R21AG075574-01
- **Recipient organization:** UNIVERSITY OF PENNSYLVANIA
- **Principal Investigator:** Rebecca Hubbard
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $454,731
- **Award type:** 1
- **Project period:** 2022-02-15 → 2025-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10352791

## Citation

> US National Institutes of Health, RePORTER application 10352791, Informatics Methods for Leveraging Clinical Data Sources to Study Risk Factors for Alzheimer's Disease (1R21AG075574-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10352791. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
