# Discovering interpretable mechanisms explaining high dimensional biomolecular data

> **NIH NIH R35** · UT SOUTHWESTERN MEDICAL CENTER · 2024 · $410,000

## Abstract

Discovering interpretable mechanisms explaining high-dimensional biomolecular data
Project summary. How protein and RNA sequence encodes folding, aggregation, and function is a fundamental
question with wide-ranging human health implications. Discovering predictive principles for this encoding
requires computational approaches that offer mechanistic insight, especially for the large fraction of intrinsically
disordered proteins for which experimental structural information is limited. Yet the complexity and dimensionality
of this problem poses fundamental challenges to existing computational methods. The axiomatic approach,
modeling behavior from first-principles, is limited by simulation runtime and unknown context-dependent
parameters. Informatics-based approaches such as deep learning could potentially discover principles by
integrating large datasets across scales and complexity. However, these models produce “black box” predictions
that i) are difficult to understand and ii) generalize poorly beyond their training data (i.e. well-understood regime).
 My lab developed methods to overcome limitations of both types of approaches. (1) Axiomatic: we
developed a statistical physics method to exponentially enhance sampling of protein self-assembly from
structurally heterogeneous monomers in molecular dynamics simulations. (2) Informatic: we invented essence
neural networks (ENNs) based on neurobiological principles and demonstrated that they overcome the above
limitations of deep learning on a wide range of learning tasks, including sequence-to-function prediction.
 Using both axiomatic and informatic approaches, in the next five years my lab will tackle three instances
of the sequence-structure-function problem: 1) Use enhanced sampling molecular dynamics simulations to
discover transition states of neurotoxic oligomer and fibril formation of Abeta and tau peptide monomers; 2) Use
ENNs to discover the RNA-sequence rules driving RNA-associated tau fibril aggregation in neurodegenerative
disease using tau protein and colocalized RNA sequence datasets; 3) Use ENNs to distill the sequence rules
determining whether a strain or mutant of beta lactamase protein can neutralize each antibiotic within a diverse
drug panel, and identify potential future antibiotic resistant mutants. Our long-term goal is to develop an ENN-
based platform for automated transformation of data into axioms. Leveraging well-established collaborations
with colleagues of wide expertise, we will pursue these goals by combining our unique computational approaches
with experimental resources, including time-resolved protein aggregation assays, patient-derived tau fibrils co-
localized with sequence-specific RNA, high-throughput liquid culture antibiotic screens, multiplexed directed
evolution experiments of antibiotic resistance, and large in-house libraries of peptide and RNA mutant libraries.
 This work lays the foundation for transforming large datasets into human-understandable ru...

## Key facts

- **NIH application ID:** 10910030
- **Project number:** 5R35GM150897-02
- **Recipient organization:** UT SOUTHWESTERN MEDICAL CENTER
- **Principal Investigator:** Milo Lin
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $410,000
- **Award type:** 5
- **Project period:** 2023-09-01 → 2028-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10910030

## Citation

> US National Institutes of Health, RePORTER application 10910030, Discovering interpretable mechanisms explaining high dimensional biomolecular data (5R35GM150897-02). Retrieved via AI Analytics 2026-05-28 from https://api.ai-analytics.org/grant/nih/10910030. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
