# Protein Sequence Matching

> **NIH NIH R01** · IOWA STATE UNIVERSITY · 2020 · $307,491

## Abstract

Combining information from the vast body of protein sequences within the framework of protein structures
enables the deeper comprehension of the complex effects of amino acid substitutions. Compiling the sequence
correlations within protein structural domains will lead to better distinguishing between neutral and deleterious
changes. Protein structures provide the frameworks for understanding the sequence data, through physical
proximity of directly interacting amino acids and in the manifestation of allostery. This will transform sequence
matching from a 1-D process to a 3-D process. Due to the rapid advances in sequencing, the large
numbers of available genomes now provide hundreds of millions of protein sequences, and similar advances in
structural biology now provide 100,000+ protein structures. By combining these data, our preliminary results
show that accounting for the pairwise correlations in the sequence for pairs, closely interacting in the protein
structures, immediately yields enhanced ability to identify similar structures by means of sequence matching.
Other preliminary data show that function identification by sequence matching is also improved. Such improved
homolog identification can lead to progress in structure prediction. The overarching goal here is to apply a
deep knowledge of protein structure, together with the analyses of the available sequence data, to the
important problem of protein sequence matching. We take an entirely new, highly innovative and uniquely
multi-faceted approach for this important problem. It is well established that physical factors such as amino
acid dense packing, and other physical aspects of structures affect the conservation of amino acids, and these
are accounted for in the new approaches taken here to sequence matching. The rationale is that protein
structures provide the physical information and the framework for improving sequence matching to
incorporate aspects of 3-D structure and allostery into sequence matching. Accounting for protein
flexibility and conformational dynamics will further broaden the investigated conformational space, as well as
provide a better understanding of the correlations important for sequence evolution. Results from this project
will improve the practice of molecular biology, particularly the identification of functions of proteins having no
assigned function, and this is certain to have major impacts upon the understanding of evolution. This project
will apply innovative new methods for extracting correlations in sequence, structure and dynamics, by
datamining of sequences and structures. The novel structure-based approaches will enable major advances in
sequence matching that will be implemented and disseminated on new web servers, made available to
anyone. The outcomes of the project will enable any scientist to discriminate significantly more effectively
between similar and dissimilar sequences. This better discrimination is essential for better function predic...

## Key facts

- **NIH application ID:** 9851415
- **Project number:** 5R01GM127701-03
- **Recipient organization:** IOWA STATE UNIVERSITY
- **Principal Investigator:** ROBERT L JERNIGAN
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $307,491
- **Award type:** 5
- **Project period:** 2018-02-01 → 2022-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9851415

## Citation

> US National Institutes of Health, RePORTER application 9851415, Protein Sequence Matching (5R01GM127701-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9851415. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*