# Massive Integration Genomes, Phenomes, Transcriptomes, and Electronic Medical Records to Identify Genes, Pathways, Subtypes, and Progression of Alzheimer’s Disease

> **NIH NIH P30** · UNIVERSITY OF CHICAGO · 2020 · $404,956

## Abstract

Summary
Alzheimer's disease (AD) and related dementia is a major health burden, affecting 5.8 million individuals in the
US. The number of affected individuals is expected to grow from 5.8 million in 2019 to 14 million by 2060, making
the development of effective prevention and treatment strategies a critical public health priority.
AD has a substantial genetic component with heritability estimates around 70%. Genome-wide association stud-
ies (GWAS) have identiﬁed many loci robustly associated with the disease, and many of the implicated pathways
seem promising. However, the translation of these discoveries into actionable targets has been slow, primarily
due to the lack of mechanistic understanding.
To address this problem, we have developed key approaches to assign function to GWAS discoveries imple-
mented in the PrediXcan family of tools. As part of the GTEx consortium, we have trained and optimized prediction
models for expression and splicing traits in 49 human tissues. Building on this work, we constructed PhenomeX-
can, a knowledge base of the putative function of every heritable human gene based on the associations between
the genetically regulated component of gene expression, and over 4000 human traits.
We propose here, in Speciﬁc Aim 1, to expand PhenomeXcan using the genetic components of splicing ratios and
protein levels. This multidimensional array will be interrogated with state of the art statistical methods to improve
our ability to identify causal genes, pathways, and latent subtypes of AD.
The primary goal of precision medicine is to provide the right therapy for the right patient at the right time.
Therefore, the ability to cluster patients into subtypes with potentially different response to different treatments
will be key in the journey to achieving precision medicine.
Another source of large scale health-related data is the MarketScan database, which includes electronic health
records of 250 million patients in the US. In Speciﬁc Aim 2, we propose to use this massive dataset as an orthog-
onal source to identify AD subtypes. To investigate the pathogenesis and progression of AD, we will leverage the
longitudinal aspect of the EHR.
The EHR will be analyzed to ﬁnd clusters of phenotypes associated with AD diagnoses, and these clusters will
be matched with the functional gene clusters found in SA1. The clusters of phenotypes and genes will be used to
inform the modeling of disease progression.
In Speciﬁc Aim 3, we use the UK Biobank, a resource of self-reported disease and EHR as well as genotype
information, and BioVU, a separate biobank with genotype and EHR, to validate the results from SA1 and SA2,
as well as synthesize the two results into a second iteration of disease clustering.

## Key facts

- **NIH application ID:** 10123280
- **Project number:** 3P30DK020595-43S1
- **Recipient organization:** UNIVERSITY OF CHICAGO
- **Principal Investigator:** GRAEME I BELL
- **Activity code:** P30 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $404,956
- **Award type:** 3
- **Project period:** 1996-12-01 → 2023-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10123280

## Citation

> US National Institutes of Health, RePORTER application 10123280, Massive Integration Genomes, Phenomes, Transcriptomes, and Electronic Medical Records to Identify Genes, Pathways, Subtypes, and Progression of Alzheimer’s Disease (3P30DK020595-43S1). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10123280. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*