Unraveling molecular and system-level mechanisms of human disease-associated protein mutations

NIH RePORTER · NIH · R35 · $392,641 · view on reporter.nih.gov ↗

Abstract

Project Summary Quickly growing genomic and phenotypic data in large-scale biobank efforts are increasingly associating genetic variants to the predisposition, onset, and progression of human diseases. However, knowledge remains limited about the mechanisms underlying those associations, not to mention that many more variants are with uncertain clinical significance. The widening gap between variant data and mechanistic knowledge further hinders the delivery of prognostics, diagnostics, and therapeutics for growing healthcare demands. In response to the knowledge gap, the PI’s long-term research goal is twofold: (1) to unravel how a genetic change ripples through various aspects across atomic, molecular, and cellular levels to cause human diseases or confer drug resistance; and (2) to translate learned mechanistic knowledge to effective therapeutic strategies for human diseases and drug resistance. Toward the long-term research goal, this project focuses on coding variants leading to protein mutations, builds upon our recent progress in physics-driven protein design and data-driven machine learning for mutational effects, and proposes to widen and deepen the unraveling of disease-associated protein mutations along three directions. The first two directions involve forward prediction and causal inference of hierarchical mutational phenotypes across molecular and cellular levels, which will generate mechanistic hypotheses while predicting disease phenotypes. The third direction involves inverse design of perturbation experiments, including protein mutagenesis and ligand perturbation, to test the generated mechanistic hypotheses directly and rationally. To advance along the three directions in the next five years, we will integrate molecular physics, systems knowledge, and emerging large-scale functional data in a systematic and rigorous framework to probabilistically predict, explain, and design phenotypes of protein mutations across biological scales. And we will fuse molecular modeling, network analysis, multimodal machine learning, graph learning, and conditional generative models in this regard, while continuing experimental and clinical collaborations in teams and communities. The expected contributions of the project, besides the computational methods, predicted phenotypes, hypothesized mechanisms, and designed experiments, also include an integrated data platform friendly for cross-disciplinary machine learning and a resource and discovery platform promoting clinical feedback loops.

Key facts

NIH application ID: 10842825
Project number: 2R35GM124952-06
Recipient: TEXAS ENGINEERING EXPERIMENT STATION
Principal Investigator: Yang Shen
Activity code: R35
Funding institute: NIH
Fiscal year: 2024
Award amount: $392,641
Award type: 2
Project period: 2017-09-15 → 2029-08-31