Resolving and understanding the genomic basis of heterogeneous complex traits and diseases

NIH RePORTER · NIH · R35 · $234,750 · view on reporter.nih.gov ↗

Abstract

Parent grant Resolving and understanding the genomic basis of heterogeneous complex traits and diseases. Hundreds of genomics studies have exposed major gaps in our understanding of the mechanistic relationships between genomic variation, cellular processes, tissue function, and trait variation. The goal of the parent project is to develop a suite of computational frameworks that integrate massive collections of genomic and biomedical data to make the following three advances: Direction 1: Discern and leverage mechanism-based subtypes of complex traits and diseases. Direction 2: Characterize physiology and disease along the human lifespan and across the sexes. Direction 3: Find analogous contexts in model organisms for studying human traits/diseases. As demonstrated by us and others, genome-wide molecular networks are grand unifiers of molecular data and knowledge, and serve as powerful tools to contextually understand the roles genes play in cellular pathways, tissue physiology, phenotype/disease mechanisms, and drug action. Hence, a central aspect of our parent project is to develop multiple machine learning approaches to leverage molecular networks to generate accurate, testable hypotheses about the roles genes play in defining subtypes, age/sex differences, and cross-species analogs of a range of complex disorders. As part of this work, we have developed GenePlexus github.com/krishnanlab/GenePlexus, an open source software to run and benchmark our state-of-the-art approach for combining genome-scale networks with supervised machine learning (ML) to get accurate novel predictions about various gene attributes (e.g., pathway membership or disease association; Liu*, Mancuso*, et al., 2020 Bioinformatics). Similarly, our group has committed efforts to make all our other computational methods available to the broader biomedical research community in the form of software tools for open science. We have released such software with nearly all our papers. Other recent examples include: ● PecanPy github.com/krishnanlab/PecanPy for parallelized, efficient, and accelerated node2vec. ● Expresto github.com/krishnanlab/Expresto for imputing unmeasured genes in transcriptomes. ● Txt2Onto github.com/krishnanlab/Txt2Onto for annotating –omics samples based on free-text metadata. Goal of the supplement project and current prototype GenePlexus: A cloud platform for network-based machine learning The goal of the proposed supplement is to take our software development to the next level by a building a new cloud-based GenePlexus platform to enable: i) biomedical/experimental researchers to seamlessly take advantage of network-based ML to generate interpretable genome-wide predictions, and ii) computational researchers to run network-based ML, retrieve results, and integrate with existing data analysis workflows. The project team, which includes the PI, a postdoc trained in cloud computing, and two professional software engineers – has worked together over the past...

Key facts

NIH application ID: 10406616
Project number: 3R35GM128765-04S1
Recipient: MICHIGAN STATE UNIVERSITY
Principal Investigator: Arjun Krishnan
Activity code: R35
Funding institute: NIH
Fiscal year: 2021
Award amount: $234,750
Award type: 3
Project period: 2018-08-15 → 2023-07-31