# Interpretable Machine Learning to Identify Alzheimer's Disease Therapeutic Targets

> **NIH NIH R01** · UNIVERSITY OF WASHINGTON · 2021 · $582,016

## Abstract

Project Summary
Alzheimer’s disease (AD) is an urgent national and international research priority. Amyloid plaques and
neurofibrillary tangles are the hallmark of AD. Their building blocks are Amyloid-β (Aβ) and tau, respectively.
At present, we lack an understanding of the set of genes that affect formation of plaques and tangles along with
protective and pathological responses to these toxic peptides.
Biologists are now gathering gene expression data and Aβ and tau measures from human brain tissues. The
current approach attempts to find a set of features (here, gene expression levels) that best predict an outcome (Aβ
or tau level). The identified features, biomarkers, can help determine the molecular basis for plaques and tangles.
Unfortunately, false positive biomarkers are very common, as evidenced by low success rates of replication in
independent data and low success reaching clinical practice (less than 1%). We seek to radically shift the current
paradigm in biomarker discovery by resolving three fundamental problems with the current approach using novel,
theoretically well-founded machine learning (ML) methods to learn interpretable models from data.
Aim 1. Learn an interpretable feature representation from publicly available, high-throughput brain data.
High-dimensionality, hidden variables, and complex feature correlations create a discrepancy between
predictability (i.e., observed statistical associations) and true biological interactions. To increase the chance to
identify true positive biomarkers, we need new feature selection criteria to learn a model that better explains
rather than simply predicts the outcome. To do so, our proposed ML algorithms will identify the genes that are
likely to give a meaningful explanation of the outcome (Aβ or tau level) by inferring both the functions of genes
in the cellular processes contributing to AD and the gene interaction network from many existing brain datasets.
Aim 2. Make interpretable predictions using a unified framework to explain model predictions. Due to
disease heterogeneity, complex models (e.g., deep learning or ensemble models) often more accurately describe
relationships between genes and an outcome than simpler, linear models, but lack interpretability. We will
develop a novel ML framework that interprets complex model predictions by estimating the importance of each
feature to a specific prediction, which will identify features of high importance for each individual as personalized
markers and classify subjects based on these importance estimates.
Aim 3. Validate the identified candidate biomarkers using powerful worm models of AD. Analyzing
observational data without doing interventional experiments cannot prove causal relationships. In collaboration
with co-I Matt Kaeberlein, we will utilize powerful nematode models of AD to test our hypotheses on the role
of certain genes as disease modifiers, and develop a new way to refine the models based on this knowledge.
Successful co...

## Key facts

- **NIH application ID:** 10132962
- **Project number:** 5R01AG061132-03
- **Recipient organization:** UNIVERSITY OF WASHINGTON
- **Principal Investigator:** Su-In Lee
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $582,016
- **Award type:** 5
- **Project period:** 2019-02-15 → 2023-12-21

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10132962

## Citation

> US National Institutes of Health, RePORTER application 10132962, Interpretable Machine Learning to Identify Alzheimer's Disease Therapeutic Targets (5R01AG061132-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10132962. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*