Collaborative Research: Mathematical Framework for Biomolecules: From Protein to RNA to Chromosomes

NIH RePORTER · NIH · R01 · $308,756 · view on reporter.nih.gov ↗

Abstract

Despite rapid progress in structural bioinformatics, a rigorous and unifying mathematical and statistical framework is missing in our current toolbox for analysis, classification, and organization of individual as well as groups of biomolecules. We have recently developed such a framework based on the elastic shape analysis (ESA) for the comparison of protein and RNA structures. Under this framework, the formal geodesic distance for any two protein/RNA structures can be computed rapidly. Probability distributions can also be built for families of protein/RNA structures, and can be used to classify structures in a principled way through statistical hypothesis testing. In addition, sequence information can be naturally incorporated so that comparison of structures can be conducted in the joint sequence-structure space. We have also developed novel algorithms for matching and analyzing protein surfaces. We propose to significantly further develop these methodologies for important applications in structure biology, including studying chromosome structures by combining both 30 structure and sequence level information. The proposed research will make significant contributions to the following areas: (1) This proposal will fill an important gap in structure biology - the lack of a rigorous mathematical and statistical framework for biomolecular structure comparison; (2) Our proposed unifying framework will allow natural incorporation of sequence information for structure comparison; (3) Our approach can uncover distinct clusters at the deepest level of current classification scheme (i.e. SCOP family), enabling a finer classification of biomolecular structures. Preliminary results indicate that by using carefully measured structural similarity, we will obtain representative sets of proteins of higher quality than those by current sequence similarity based methods; (4) The probabilistic models designed for protein/RNA backbone structures and surfaces will capture the flexible nature of protein structures through the use of ensemble of conformations, while maintaining high computational efficiency. These models will also enable effective characterization of family-specific variations among proteins, an important task none of the existing methods work well; (5) Protein/RNA structures will be organized using network-based data structures using probabilistic approaches. This new organization will effectively integrates sequence, backbone structure, and surface information, facilitating discovery of novel insight; and (6) these new development will be rapidly generalized for studying chromosome structures. This proposed research will allow development of tools that will also be applicable in other areas of shape analysis, including medical image analysis, computer vision, and pattern recognition. Our work will help to increase the communication between the field of protein structure analysis and the field of shape analysis, and will stimulate more cross-over d...

Key facts

NIH application ID
10189648
Project number
5R01GM126558-05
Recipient
FLORIDA STATE UNIVERSITY
Principal Investigator
Jinfeng Zhang
Activity code
R01
Funding institute
NIH
Fiscal year
2021
Award amount
$308,756
Award type
5
Project period
2017-07-01 → 2023-06-30