Learning the Regulatory Code of Alzheimer's Disease Genomes

NIH RePORTER · NIH · U01 · $289,006 · view on reporter.nih.gov ↗

Abstract

Project Summary Alternative splicing is a key cellular process whose dysregulation has been implicated broadly across human genetic disease. Dr. Knowles previously developed LeafCutter, a flexible, scalable, annotation-free tool to quantify local patterns of RNA splicing from short-read RNA-seq data. While LeafCutter has been quite widely adopted and we have actively maintained it and addressed issues on github, it remains “early stage” software. We propose software engineering improvements: 1) appropriate packaging using conda with standard build, installation, testing and logging processes, 2) containerization using Docker, 3) standardization of input/output data formats/interfaces, and 4) refactoring to use a standard workflow language. These improvements will allow us to distribute LeafCutter through repositories including PyPI, BioConda, DockStore, and Galaxy. Finally improvements to documentation, testing and version management will make contributions to LeafCutter from the open source community more feasible and easier to integrate. The parent award for the proposed work is U01 AG068880-01 “Learning the Regulatory Code of Alzheimer's Disease Genomes”, where we are developing state-of-the-art deep learning (DL) and machine learning (ML) models to better understand the genetic basis of AD. This award makes extensive use of LeafCutter. In Aim 1 we are building DL models of pre- and post- transcriptional regulation: for the latter LeafCutter provides training data for our neural network model of the sequence determinants of RNA splicing in AD-relevant cell types and states. In Aim 2 we connect AD-associated structural variation to functional variation, including RNA splicing variation. In Aim 3, we will build trans-expression QTL networks across thousands of post-mortem brain samples: with the improvements to the LeafCutter ecosystem proposed here we will be able to straightforwardly extend to trans splicing QTL networks. While we and our collaborators are ourselves heavy users of LeafCutter, we will continue to ensure we provide for the needs and use-cases of the broader genomics community.

Key facts

NIH application ID
10406760
Project number
3U01AG068880-02S1
Recipient
ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI
Principal Investigator
David Arthur Knowles
Activity code
U01
Funding institute
NIH
Fiscal year
2021
Award amount
$289,006
Award type
3
Project period
2020-09-01 → 2025-08-31