# Learning the Regulatory Code of Alzheimer's Disease Genomes

> **NIH NIH U01** · ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI · 2021 · $289,006

## Abstract

Project Summary
Alternative splicing is a key cellular process whose dysregulation has been implicated broadly
across human genetic disease. Dr. Knowles previously developed LeafCutter, a flexible, scalable,
annotation-free tool to quantify local patterns of RNA splicing from short-read RNA-seq data.
While LeafCutter has been quite widely adopted and we have actively maintained it and
addressed issues on github, it remains “early stage” software. We propose software
engineering improvements: 1) appropriate packaging using conda with standard build,
installation, testing and logging processes, 2) containerization using Docker, 3) standardization
of input/output data formats/interfaces, and 4) refactoring to use a standard workflow
language. These improvements will allow us to distribute LeafCutter through repositories
including PyPI, BioConda, DockStore, and Galaxy. Finally improvements to documentation,
testing and version management will make contributions to LeafCutter from the open source
community more feasible and easier to integrate.
The parent award for the proposed work is U01 AG068880-01 “Learning the Regulatory Code of
Alzheimer's Disease Genomes”, where we are developing state-of-the-art deep learning (DL)
and machine learning (ML) models to better understand the genetic basis of AD. This award
makes extensive use of LeafCutter. In Aim 1 we are building DL models of pre- and post-
transcriptional regulation: for the latter LeafCutter provides training data for our neural
network model of the sequence determinants of RNA splicing in AD-relevant cell types and
states. In Aim 2 we connect AD-associated structural variation to functional variation, including
RNA splicing variation. In Aim 3, we will build trans-expression QTL networks across thousands
of post-mortem brain samples: with the improvements to the LeafCutter ecosystem proposed
here we will be able to straightforwardly extend to trans splicing QTL networks. While we and
our collaborators are ourselves heavy users of LeafCutter, we will continue to ensure we
provide for the needs and use-cases of the broader genomics community.

## Key facts

- **NIH application ID:** 10406760
- **Project number:** 3U01AG068880-02S1
- **Recipient organization:** ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI
- **Principal Investigator:** David Arthur Knowles
- **Activity code:** U01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $289,006
- **Award type:** 3
- **Project period:** 2020-09-01 → 2025-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10406760

## Citation

> US National Institutes of Health, RePORTER application 10406760, Learning the Regulatory Code of Alzheimer's Disease Genomes (3U01AG068880-02S1). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10406760. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
