# Learning the Regulatory Code of Alzheimer's Disease Genomes

> **NIH NIH U01** · ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI · 2021 · $1,094,724

## Abstract

With ageing populations world-wide, neurodegenerative diseases are placing an ever increasing 
burden on long- term well-being, healthcare costs and family life. Despite decades of research and 
enormous investment, no disease-modifying treatment is available for the most common of these 
diseases: Alzheimer’s (AD). The majority of these, to-date unsuccessful, efforts have focused 
 on one potential cause of AD: amyloid-β aggregation. Combining population-scale data 
collection, human genetics and machine learning provides a way forward to uncover and characterize 
new causal cellular processes involved in AD. This would provide an array of potential therapeutic 
targets, increasing the chance that one will be more easily modulated than the amyloid-β pathway. 
AD-specific genomic datasets of unprecedented scale are being actively collected: whole genome 
sequencing (WGS) from ~20k individuals, gene expression (RNA-seq) and epigenomics (ATAC-seq, 
histone ChIP-seq) from
>1000 post-mortem AD brains, single-cell transcriptomes and similar modalities in peripheral and 
brain-resident innate immune cells (which we and others have shown to be AD-relevant). Effectively 
integrating these diverse data to better understand AD represents a substantial computational 
challenge, both in terms of data scale and analysis complexity. This proposal leverages 
state-of-the-art deep learning (DL) and machine learning (ML), combined with human genetic 
analyses, to address this challenge. We will train DL models to predict epigenomic signals and 
RNA splicing from genomic sequence, enabling in silico mutagenesis to estimate the 
functional impact (a “delta score”) of any genetic variant. The delta scores will be used in 
genetic analyses that distinguish causal associations: cellular changes that drive AD 
pathogenesis rather than downstream/side effects of disease. Delta scores will aid in 
associating both rare and common variants to AD. To achieve sufficient power, rare variants must be 
aggregated (e.g. for a gene): delta scores will allow filtering out many likely non-functional 
(particularly non-coding) variants. Most common variants from AD Genome Wide Association Studies 
(GWAS) are simply correlated with the causal variant due to linkage disequilibrium (LD). Delta 
scores, combined with trans-ethnic GWAS, will enable estimation of the likely causal variant(s). 
These analyses will highlight variants and genes involved in AD. However, genes do not operate in a 
vacuum so robust probabilistic ML will be used to learn cell-type and disease-specific gene 
regulatory networks from sorted bulk and single-cell RNA-seq. The detected networks will be 
integrated with our genetic findings to discover network neighborhoods/pathways especially 
enriched in AD variants. Such pathways will be prime candidates for future functional and 
therapeutic studies of AD.

## Key facts

- **NIH application ID:** 10247588
- **Project number:** 5U01AG068880-02
- **Recipient organization:** ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI
- **Principal Investigator:** David Arthur Knowles
- **Activity code:** U01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $1,094,724
- **Award type:** 5
- **Project period:** 2020-09-01 → 2025-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10247588

## Citation

> US National Institutes of Health, RePORTER application 10247588, Learning the Regulatory Code of Alzheimer's Disease Genomes (5U01AG068880-02). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10247588. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*