# Open data-driven infrastructure for building biomolecular force fields for predictive biophysics and drug design

> **NIH NIH R01** · UNIVERSITY OF COLORADO · 2022 · $10,780

## Abstract

PROJECT SUMMARY/ABSTRACT
This Project Summary is unchanged from the original R01. The study of biomolecular interactions and design
of new therapeutics requires accurate physical models of the atomistic interactions between small molecules and
biological macromolecules. Over the least few decades, molecular mechanics force ﬁelds have demonstrated the
potential that physical models hold for quantitative biophysical modeling and predictive molecular design. However,
a signiﬁcant technology gap exists in our ability to build force ﬁelds that achieve high accuracy, can be systemati-
cally improved in a statistically robust manner, be extended to new areas of chemistry, can model post-translational
and covalent modiﬁcations, are able to quantify systematic errors in predictions, and can be broadly applied across
a high-performance software packages. In this project, we aim to bridge this technology gap to enable new gen-
erations of accurate quantitative biomolecular modeling and (bio)molecular design for chemical biology and drug
discovery. In Aim 1, we will produce a modern, open infrastructure to enable practitioners to rapidly and conve-
niently construct and employ accurate and statistically robust physical force ﬁelds via automated machine learning
methods. In Aim 2, we will construct open, machine-readable experimental and quantum chemical datasets that
will accelerate next-generation force ﬁeld development. In Aim 3, we will develop statistically robust Bayesian
inference techniques to enable the auto- mated construction of type assignment schemes that avoid overﬁtting
and selection of physical functional forms statistically justiﬁed by the data. This approach will also provide an
estimate of the systematic error in predicted properties arising from uncertainty in parameters or functional form
choices—generally the dominant source of error—to be quantiﬁed with little added expense. In Aim 4, we will
integrate and apply this infrastructure to produce open, transferable, self-consistent force ﬁelds that achieve high
accuracy and broad coverage for modeling small molecule interactions with biomolecules (including unnatural
amino or nucleic acids and covalent modiﬁcations by organic molecules), with the ultimate goal of covering all
major biomolecules.
This research is signiﬁcant in that the technology developed in this project has the potential to radically transform
the study of biomolecular phenomena by providing highly accurate force ﬁelds with exceptionally broad chemical
coverage via fully consistent parameterization of organic (bio)molecules. In addition, we will produce new tools to
automate force ﬁeld creation and tailoring to speciﬁc problem domains, quantify the systematic error in predictions,
and identify new data for improving force ﬁeld accuracy. This will greatly improve our ability to study diverse
biophysical processes at the molecular level, and to rationally design new small-molecule, protein, and nucleic
acid therapeutics. Thi...

## Key facts

- **NIH application ID:** 10592758
- **Project number:** 3R01GM132386-03S1
- **Recipient organization:** UNIVERSITY OF COLORADO
- **Principal Investigator:** Michael R Shirts
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $10,780
- **Award type:** 3
- **Project period:** 2020-03-01 → 2024-02-29

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10592758

## Citation

> US National Institutes of Health, RePORTER application 10592758, Open data-driven infrastructure for building biomolecular force fields for predictive biophysics and drug design (3R01GM132386-03S1). Retrieved via AI Analytics 2026-05-27 from https://api.ai-analytics.org/grant/nih/10592758. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
