# HMMER and Infernal: Finding distant homologs of sequences and RNA structures

> **NIH NIH R01** · HARVARD UNIVERSITY · 2022 · $533,450

## Abstract

Project Summary/Abstract
Genome sequence data is now available for hundreds of thousands of species. Our ability to exploit this vast trove
of information about the molecular basis and evolution of life depends on sophisticated computational analysis
tools. One important class of tools is proﬁle analysis software, for making consensus statistical models of multiple
alignments of biological sequence families, and for using those models to sensitively detect homologs and make
deep multiple alignments. Proﬁle analysis derives its power from the fact that despite the unbounded growth
of sequence data, the majority of functional sequences can be condensed into a manageably small number of
conserved families. Proﬁle software underlies numerous protein, RNA, and DNA sequence family databases. The
systematic availability of deep multiple alignments (of many thousands of sequences) is enabling revolutionary
advances in predicting molecular function and 3D structure by comparative sequence analysis.
 The HMMER and Infernal software packages from our laboratory are some of the most widely used tools
for proﬁle analysis. HMMER implements proﬁle hidden Markov models (proﬁle HMMs) of primary sequence
consensus, typically for protein domains and conserved DNA elements. Infernal implements proﬁle stochastic
context-free grammars (proﬁle SCFGs) of RNA secondary structure and sequence consensus. In the context of
the continued development of these packages, this proposal has three speciﬁc aims for new lines of research
that we expect to lead to major improvements in the accuracy, utility, and computational efﬁciency of proﬁle anal-
ysis. The ﬁrst aim proposes to develop a discontinuous Markov model of nonhomologous sequences, to improve
the ability to distinguish homologs from nonhomologs and reduce the false positive rate of database searches.
The second aim proposes to develop sketching methods for efﬁciently representing the voluminous results of a
database homology search with a subset of the most phylogenetically informative hits. The third aim proposes
to develop adaptive computation methods to ﬂexibly harness the complex mix of CPU/GPU processors, mem-
ory, and storage in modern hardware architectures, enabling efﬁcient scalable computation and near-interactive
database search times.

## Key facts

- **NIH application ID:** 10487574
- **Project number:** 5R01HG009116-06
- **Recipient organization:** HARVARD UNIVERSITY
- **Principal Investigator:** Sean R Eddy
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $533,450
- **Award type:** 5
- **Project period:** 2016-09-16 → 2026-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10487574

## Citation

> US National Institutes of Health, RePORTER application 10487574, HMMER and Infernal: Finding distant homologs of sequences and RNA structures (5R01HG009116-06). Retrieved via AI Analytics 2026-05-21 from https://api.ai-analytics.org/grant/nih/10487574. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
