# Causal and integrative deep learning for Alzheimer's disease genetics

> **NIH NIH U01** · UNIVERSITY OF MINNESOTA · 2021 · $733,352

## Abstract

Summary
 In response to PAR-19-269, “Cognitive Systems Analysis of Alzheimer's Disease Genetic and Phenotypic
Data”, we propose developing and applying more powerful and robust machine learning methods for causal and
integrative analysis, especially deep learning approaches for instrumental variable analysis, to identify causal
risk/protective factors for Alzheimer's disease (AD) in the post-GWAS era by leveraging published large-scale
GWAS, whole-genome sequencing (WGS) and other omic and neuroimaging data. Our main motivation is to ex-
tend an emerging and increasingly inﬂuential approach of integrating GWAS with gene expression data, called
transcriptome-wide association studies (TWAS), aiming to improve over the current practice of GWAS by not only
increasing statistical power, but also identifying (putative) causal genes, thus gaining insights into the genetic basis
of common diseases and complex traits. The statistical principle underlying TWAS is the (two-sample) two-stage
least squares (2SLS) for linear models in the framework of instrumental variable (IV) analysis for causal inference.
In practice, however, TWAS may fail to identify true causal genes while giving false positives due to the violation
of its modeling assumptions, e.g., due to non-linear effects of IVs or gene expression, or due to invalid IVs (in the
presence of horizontal pleiotropy of SNPs). First, we propose developing linear models and neural network models
incorporating a large number of functional annotations on the genome (e.g. various types of functional genomic
and epigenetic data from the ENCODE and Roadmap Epigenomics projects) as prior knowledge to improve im-
puting/predicting gene expression (or other molecular or imaging endophenotypes or complex traits/diseases) via
SNPs, corresponding to the ﬁrst stage of 2SLS. Second, we propose neural networks as more ﬂexible non-linear
models for the second stage of 2SLS in the presence of invalid IVs, which may be the SNPs having direct (or
horizontal pleiotropic) effects on the outcome as expected from the wide-spread pleiotropy. Then we combine the
approaches in the above two stages to form a more ﬂexible and robust neural network approach as an extension of
2SLS for causal inference. Third, we consider inferring causal directions between two traits, e.g. a gene's expres-
sion and AD, allowing non-linear relationships between SNPs and traits and between the two traits. This is critical
in reducing false positives, e.g. due to reverse causation, but has been largely under-studied. Fourth, we apply the
new (and existing) methods to transcriptomic, proteomic, neuroimaging and AD GWAS/WGS data to identify (pu-
tative) causal genes, proteins and brain regions of interest (ROIs) for AD, while building the corresponding genetic
prediction models for endophenotypes and AD risk. Finally, we will develop and disseminate publicly available
software implementing the proposed analysis methods, e.g. as Python programs or R package...

## Key facts

- **NIH application ID:** 10267373
- **Project number:** 1U01AG073079-01
- **Recipient organization:** UNIVERSITY OF MINNESOTA
- **Principal Investigator:** Wei Pan
- **Activity code:** U01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $733,352
- **Award type:** 1
- **Project period:** 2021-09-15 → 2026-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10267373

## Citation

> US National Institutes of Health, RePORTER application 10267373, Causal and integrative deep learning for Alzheimer's disease genetics (1U01AG073079-01). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10267373. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*