# Distance-based ab initio protein structure prediction

> **NIH NIH R01** · UNIVERSITY OF MISSOURI-COLUMBIA · 2020 · $342,270

## Abstract

Project Summary
Predicting the three-dimensional structures of proteins without using known structures from the
Protein Data Bank (PDB) as templates (ab initio) remains a grand challenge of computational
biology. Whereas template-based modeling is now a mature field, ab initio modeling is a
comparatively nascent one, especially for large proteins with complex topologies and multiple
domains. The need for advances in ab initio modeling is evident. A lot of protein sequences do
not have (recognizable) templates in the PDB, and the pace of experimental structure
determination is incommensurate with the scale of the problem. Herein, we propose a new
approach to ab initio modeling that consists of novel deep learning architectures to predict inter-
residue distances and domain boundaries as well as robust, iterative optimization methods to
construct tertiary structures from the predicted distances. This project builds on the success of
our current R01, particularly the outstanding performance of the Cheng group in the 2018
worldwide protein structure prediction experiment – CASP13 – where our MULTICOM suite
ranked among the top three tertiary structure predictors, alongside Google DeepMind’s AlphaFold.
The methods will be implemented as open-source tools for the emerging field of distance-based
ab initio protein structure modeling. We will apply the methods to study protein homo-oligomers
and self-assemblies, based on our novel discovery that the quaternary structure contacts within
homo-oligomers can be predicted by deep learning methods from the co-evolutionary signals
embedded in multiple sequence alignments of protein monomers. Furthermore, we will apply the
methods to predict the folds, functional sites, superfamilies, and protein-protein interactions of
proteins that contain “essential Domains of Unknown Function” (eDUFs), a group of evolutionarily
conserved, essential proteins that represents an important uncharted region of protein
function/fold space. The predictions for a diverse and representative subset of eDUFs will be
experimentally validated through a unique collaboration with the structural biology group of Dr.
Tanner.

## Key facts

- **NIH application ID:** 10051137
- **Project number:** 2R01GM093123-09A1
- **Recipient organization:** UNIVERSITY OF MISSOURI-COLUMBIA
- **Principal Investigator:** Jianlin Cheng
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $342,270
- **Award type:** 2
- **Project period:** 2010-06-01 → 2024-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10051137

## Citation

> US National Institutes of Health, RePORTER application 10051137, Distance-based ab initio protein structure prediction (2R01GM093123-09A1). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10051137. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
