Project Summary Quickly growing genomic and phenotypic data in large-scale biobank efforts are increasingly associating genetic variants to the predisposition, onset, and progression of human diseases. However, knowledge remains limited about the mechanisms underlying those associations, not to mention that many more variants are with uncertain clinical significance. The widening gap between variant data and mechanistic knowledge further hinders the delivery of prognostics, diagnostics, and therapeutics for growing healthcare demands. In response to the knowledge gap, the PI’s long-term research goal is twofold: (1) to unravel how a genetic change ripples through various aspects across atomic, molecular, and cellular levels to cause human diseases or confer drug resistance; and (2) to translate learned mechanistic knowledge to effective therapeutic strategies for human diseases and drug resistance. Toward the long-term research goal, this project focuses on coding variants leading to protein mutations, builds upon our recent progress in physics-driven protein design and data-driven machine learning for mutational effects, and proposes to widen and deepen the unraveling of disease-associated protein mutations along three directions. The first two directions involve forward prediction and causal inference of hierarchical mutational phenotypes across molecular and cellular levels, which will generate mechanistic hypotheses while predicting disease phenotypes. The third direction involves inverse design of perturbation experiments, including protein mutagenesis and ligand perturbation, to test the generated mechanistic hypotheses directly and rationally. To advance along the three directions in the next five years, we will integrate molecular physics, systems knowledge, and emerging large-scale functional data in a systematic and rigorous framework to probabilistically predict, explain, and design phenotypes of protein mutations across biological scales. And we will fuse molecular modeling, network analysis, multimodal machine learning, graph learning, and conditional generative models in this regard, while continuing experimental and clinical collaborations in teams and communities. The expected contributions of the project, besides the computational methods, predicted phenotypes, hypothesized mechanisms, and designed experiments, also include an integrated data platform friendly for cross-disciplinary machine learning and a resource and discovery platform promoting clinical feedback loops.