# New Computational Methods for Data-driven Protein Structure Prediction

> **NIH NIH R01** · TOYOTA TECHNOLOGICAL INSTITUTE / CHICAGO · 2021 · $321,599

## Abstract

Proteins play fundamental roles in all biological processes. Accurate description of protein
structure is an important step towards understanding of biological life and highly relevant in the
development of therapeutics and drugs. Although experimental structure determination has
been greatly improved, there is still a very large gap between the number of available protein
sequences and that of solved protein structures, which can only be filled by computational
prediction. The long-term goal of this project is to apply machine learning and optimization
algorithms to understand protein sequence-structure-function relationship by analyzing
sequence, structure and functional data and to develop data-driven computational methods and
tools for structure and functional prediction. We believe that by developing sophisticated
algorithms to extract knowledge from the increasing sequence and structure data, we can model
protein sequence-structure relationship very accurately and improve structure and functional
prediction greatly. This project has already produced a few CASP-winning, widely-used data-
driven algorithms and web servers (http://raptorx.uchicago.edu) for protein structure modeling.
This renewal will further develop machine learning (especially deep learning) algorithms for
protein structure modeling without good templates. The specific aims are: (1) developing deep
learning (DL) algorithms for the prediction of protein contact and distance matrix; (2) developing
distance-based algorithms for fast and accurate ab initio folding of proteins without templates; (3)
developing DL algorithms for template-based modeling with only weakly similar templates. This
renewal will lead to further understanding and new models of protein sequence-structure
relationship and yield publicly available resources for automated, accurate, quantitative analysis
for a wide range of proteins. The impact will be multiplied by tens of thousands of worldwide
users employing our web servers to study a wide variety of proteins relevant to basic biological
research and human diseases, in both low- and high-throughput experiments.

## Key facts

- **NIH application ID:** 10246779
- **Project number:** 5R01GM089753-11
- **Recipient organization:** TOYOTA TECHNOLOGICAL INSTITUTE / CHICAGO
- **Principal Investigator:** JINBO XU
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $321,599
- **Award type:** 5
- **Project period:** 2010-05-14 → 2024-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10246779

## Citation

> US National Institutes of Health, RePORTER application 10246779, New Computational Methods for Data-driven Protein Structure Prediction (5R01GM089753-11). Retrieved via AI Analytics 2026-05-21 from https://api.ai-analytics.org/grant/nih/10246779. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
