Data-driven Computational Modeling and Refinement of Protein Structures on Genomic Scales

NIH RePORTER · NIH · R35 · $376,518 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY/ABSTRACT: A key remaining gap in our understanding of biological systems at the molecular level is how to structurally annotate the “dark” protein families—the portion of protein families unsolved by experimental structure determination techniques and inaccessible to homology modeling. Nearly a quarter of protein families are currently dark, where molecular conformation is completely unknown and this gap is likely to expand further with the rapid accumulation of new protein sequences without annotated structures. The key challenge is now how to bridge this gap to gain a comprehensive understanding of biology and disease, thereby paving the way to structure-based drug design at genomic scale. Computational protein modeling plays a key role in this effort due to its scalability and genome-wide applicability. My laboratory focuses on the development and application of novel data-driven computational modeling and refinement methods to increase accuracy and coverage of protein structure prediction on genomic scale irrespective of homology. Future research focuses on improving homology-free protein folding using multiscale de novo modeling driven by deep learning-based inter-residue interactions, enhancing low-homology threading or fold recognition by formulating new algorithms for remote template identification despite low evolutionary relatedness, and developing methods for high-resolution restrained structure refinement guided by generalized ensemble search for driving computational models to near-experimental accuracy. Proteome-wide computational modeling and refinement effort will be conducted, leveraging our unique access to large-scale supercomputing infrastructure, to build high-confidence models covering the dark protein families, which will be organized in a database for public access. This comprehensive database of structural annotations will shed light on the structures, functions, and interactions of the dark proteome, with broad implications in drug discovery and human health. Software and web servers will be freely disseminated to help worldwide community of biomedical researchers to apply these methods to their specific research problems, thus multiplying the impact of computational modeling on basic research in biology and medicine. My research program will involve close collaborations with other NIGMS-supported investigators, create training opportunities for the next generation of researchers including members from underrepresented groups, and foster future research advances in structural bioinformatics and computational biology.

Key facts

NIH application ID: 10456948
Project number: 5R35GM138146-04
Recipient: VIRGINIA POLYTECHNIC INST AND ST UNIV
Principal Investigator: Debswapna Bhattacharya
Activity code: R35
Funding institute: NIH
Fiscal year: 2022
Award amount: $376,518
Award type: 5
Project period: 2020-09-15 → 2025-07-31