# Open data-driven infrastructure for building biomolecular force fields for predictive biophysics and drug design

> **NIH NIH R01** · UNIVERSITY OF COLORADO · 2022 · $720,929

## Abstract

PROJECT SUMMARY/ABSTRACT
The study of biomolecular interactions and design of new therapeutics requires accurate physical models of
the atomistic interactions between small molecules and biological macromolecules. Over the least few decades,
molecular mechanics force ﬁelds have demonstrated the potential that physical models hold for quantitative
biophysical modeling and predictive molecular design. However, a signiﬁcant technology gap exists in our ability
to build force ﬁelds that achieve high accuracy, can be systematically improved in a statistically robust manner, be
extended to new areas of chemistry, can model post-translational and covalent modiﬁcations, are able to quantify
systematic errors in predictions, and can be broadly applied across a high-performance software packages.
In this project, we aim to bridge this technology gap to enable new generations of accurate quantitative biomolec-
ular modeling and (bio)molecular design for chemical biology and drug discovery. In Aim 1, we will produce a
modern, open infrastructure to enable practitioners to rapidly and conveniently construct and employ accurate
and statistically robust physical force ﬁelds via automated machine learning methods. In Aim 2, we will construct
open, machine-readable experimental and quantum chemical datasets that will accelerate next-generation force
ﬁeld development. In Aim 3, we will develop statistically robust Bayesian inference techniques to enable the auto-
mated construction of type assignment schemes that avoid overﬁtting and selection of physical functional forms
statistically justﬁed by the data. This approach will also provide an estimate of the systematic error in predicted
properties arising from uncertainty in parameters or functional form choices—generally the dominant source of
error—to be quantiﬁed with little added expense. In Aim 4, we will integrate and apply this infrastructure to produce
open, transferable, self-consistent force ﬁelds that achieve high accuracy and broad coverage for modeling small
molecule interactions with biomolecules (including unnatural amino or nucleic acids and covalent modiﬁcations by
organic molecules), with the ultimate goal of covering all major biomolecules.
This research is signiﬁcant in that the technology developed in this project has the potential to radically transform
the study of biomolecular phenomena by providing highly accurate force ﬁelds with exceptionally broad chemical
coverage via fully consistent parameterization of organic (bio)molecules. In addition, we will produce new tools to
automate force ﬁeld creation and tailoring to speciﬁc problem domains, quantify the systematic error in predictions,
and identify new data for improving force ﬁeld accuracy. This will greatly improve our ability to study diverse
biophysical processes at the molecular level, and to rationally design new small-molecule, protein, and nucleic
acid therapeutics. This approach will bring statistical rigor to the ﬁeld of force ﬁ...

## Key facts

- **NIH application ID:** 10356089
- **Project number:** 5R01GM132386-03
- **Recipient organization:** UNIVERSITY OF COLORADO
- **Principal Investigator:** Michael R Shirts
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $720,929
- **Award type:** 5
- **Project period:** 2020-03-01 → 2024-02-29

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10356089

## Citation

> US National Institutes of Health, RePORTER application 10356089, Open data-driven infrastructure for building biomolecular force fields for predictive biophysics and drug design (5R01GM132386-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10356089. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
