Transfer learning leveraging large-scale transcriptomics to map disrupted gene networks in cardiovascular disease

NIH RePORTER · NIH · DP5 · $472,500 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY/ABSTRACT Mapping the gene regulatory networks driving human disease enables the design of network-correcting treatments that target the core disease mechanism rather than merely managing symptoms. I previously developed a framework for mapping disease-dependent gene networks to enable network-based screening leveraging machine learning and human induced pluripotent stem cell modeling that identified a promising network-correcting therapy for cardiac valve disease currently progressing towards clinical trial, reported in Cell1 and Science2. However, computationally inferring the network map requires large amounts of transcriptomic data to learn the connections between genes, which impedes network-correcting drug discovery in settings with limited data including rare disease and disease affecting clinically inaccessible tissues. Although data remains limited in these settings, recent advances in sequencing technologies have driven a rapid expansion in the amount of transcriptomic data available from human tissues more broadly. Recently, the concept of transfer learning has revolutionized fields such as natural language understanding and computer vision by leveraging deep learning models pretrained on large-scale general datasets that can then be fine- tuned towards a vast array of downstream tasks with limited application-specific data that would be too limited to yield meaningful predictions in isolation. To test whether an analogous approach could enable gene network predictions with limited data, I developed and pretrained my novel deep learning model, Geneformer, with a large-scale pretraining corpus I assembled with ~30 million human single cell transcriptomes, thereby generating an invaluable checkpoint from which fine-tuning towards a broad range of downstream applications could be pursued to accelerate discovery of key network regulators and candidate network-correcting therapies. Geneformer consistently boosted predictive accuracy in a diverse panel of downstream tasks using just a limited set of task-specific training examples. I now propose to leverage Geneformer’s learned understanding of contextual gene network dynamics to address two major challenges in cardiac biology. In Aim 1, I will determine novel dosage-sensitive gene combinations and their context-dependency in cardiac cell types, thereby generating a map of contextual dosage sensitivity for genes individually or in combination that has the potential of dramatically improving our interpretation of copy number variants in genetic diagnosis of cardiac disease. In Aim 2, I will map the dysregulated gene network and discover candidate network-correcting therapeutics in a prototypical rare disease affecting clinically inaccessible tissue where progress has been impeded by limited data, hypertrophic cardiomyopathy, to accelerate the discovery of a much-needed targeted therapeutic for this life-threatening progressive disease. Overall, my novel deep learning model, Gen...

Key facts

NIH application ID
10933409
Project number
5DP5OD036170-02
Recipient
J. DAVID GLADSTONE INSTITUTES
Principal Investigator
Christina Vicky Theodoris
Activity code
DP5
Funding institute
NIH
Fiscal year
2024
Award amount
$472,500
Award type
5
Project period
2023-09-22 → 2028-07-31