# Novel Use of Genome Information to Understand Mutations

> **NIH NIH R01** · IOWA STATE UNIVERSITY · 2022 · $463,878

## Abstract

There are significant advantages from translating genome sequences into proteins, where there is a large body
of accumulated knowledge regarding their relationships among sequence, structure and function. Advances in
genome sequencing are producing a deluge of data that can be used to train and test prediction methods to
identify the characteristics of various mutants by building atop the large functional protein data. Clinicians
need to know the functional behavior of mutants - whether they are neutral or deleterious - whether they affect
protein structure – whether they affect protein dynamics - whether they affect protein binding specificity.
Protein structures have local environments for each amino acid in the sequence, and usually amino acids at
each position are compatible with their local environment. This leads to strongly correlated amino acids as
manifested in the multiple sequence alignments. This project will combine protein sequence and structure data
together with amino acid properties and their correlations to characterize each site in the protein structure to
investigate the hypothesis that outliers in the distributions over the important amino acid properties for each
position will negatively impact functionality, i.e. they will be deleterious mutants. The project will drill down
deeply to learn what is the nature of the impaired mechanism. Two diverse approaches will be taken in the two
aims: Aim 1 will investigate the amino acid property distributions to identify the properties that best characterize
each position in the sequence and structure, and determine how the outliers negatively impact the functional
structures, dynamics and binding characteristics. Preliminary results show that the deleterious mutants usually
have a significantly broader range of single amino acid properties for the deleterious mutants. Data from these
analyses will be fed into Aim 2 where two type of machine learning approaches – Extreme Learning Machines
and Random Forests will be jointly applied. Preliminary results show that incorporating just one amino acid
property yields significant gains over existing methods. One of the major strengths of this project is that results
from the two Aims will be exchanged frequently to achieve improved predictions for both approaches. The
project builds on the long experience of the PIs in datamining from protein structures and sequences, as well
as previous machine learning applications. Important potential outcomes include a more reliable, more
informed understanding of how mutants affect function. In addition, the project aims to predict connections of
mutants to specific diseases. The results of the project will be important for drug development, because the
specific part of the protein where function is impaired will be identified, to allow drug developers to narrow their
focus onto more limited parts of a protein that is targeted for drug design. The predictors established by this
project will also have the ...

## Key facts

- **NIH application ID:** 10488281
- **Project number:** 5R01HG012117-02
- **Recipient organization:** IOWA STATE UNIVERSITY
- **Principal Investigator:** ROBERT L JERNIGAN
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $463,878
- **Award type:** 5
- **Project period:** 2021-09-13 → 2026-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10488281

## Citation

> US National Institutes of Health, RePORTER application 10488281, Novel Use of Genome Information to Understand Mutations (5R01HG012117-02). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10488281. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*