Discovering interpretable mechanisms explaining high-dimensional biomolecular data Project summary. How protein and RNA sequence encodes folding, aggregation, and function is a fundamental question with wide-ranging human health implications. Discovering predictive principles for this encoding requires computational approaches that offer mechanistic insight, especially for the large fraction of intrinsically disordered proteins for which experimental structural information is limited. Yet the complexity and dimensionality of this problem poses fundamental challenges to existing computational methods. The axiomatic approach, modeling behavior from first-principles, is limited by simulation runtime and unknown context-dependent parameters. Informatics-based approaches such as deep learning could potentially discover principles by integrating large datasets across scales and complexity. However, these models produce “black box” predictions that i) are difficult to understand and ii) generalize poorly beyond their training data (i.e. well-understood regime). My lab developed methods to overcome limitations of both types of approaches. (1) Axiomatic: we developed a statistical physics method to exponentially enhance sampling of protein self-assembly from structurally heterogeneous monomers in molecular dynamics simulations. (2) Informatic: we invented essence neural networks (ENNs) based on neurobiological principles and demonstrated that they overcome the above limitations of deep learning on a wide range of learning tasks, including sequence-to-function prediction. Using both axiomatic and informatic approaches, in the next five years my lab will tackle three instances of the sequence-structure-function problem: 1) Use enhanced sampling molecular dynamics simulations to discover transition states of neurotoxic oligomer and fibril formation of Abeta and tau peptide monomers; 2) Use ENNs to discover the RNA-sequence rules driving RNA-associated tau fibril aggregation in neurodegenerative disease using tau protein and colocalized RNA sequence datasets; 3) Use ENNs to distill the sequence rules determining whether a strain or mutant of beta lactamase protein can neutralize each antibiotic within a diverse drug panel, and identify potential future antibiotic resistant mutants. Our long-term goal is to develop an ENN- based platform for automated transformation of data into axioms. Leveraging well-established collaborations with colleagues of wide expertise, we will pursue these goals by combining our unique computational approaches with experimental resources, including time-resolved protein aggregation assays, patient-derived tau fibrils co- localized with sequence-specific RNA, high-throughput liquid culture antibiotic screens, multiplexed directed evolution experiments of antibiotic resistance, and large in-house libraries of peptide and RNA mutant libraries. This work lays the foundation for transforming large datasets into human-understandable ru...