# Computational Methods for Microbial and Microbiome Sequence Analysis

> **NIH NIH R35** · JOHNS HOPKINS UNIVERSITY · 2020 · $403,409

## Abstract

Project Summary
This project will support our work on computational methods for microbial sequence analysis, including gene
finding, whole-genome alignment, genome assembly, and metagenomic sequence analysis. Over the years we
have developed multiple systems to solve problems in these areas, some of which are very widely used. These
tools need continued updates and improvements to keep pace with changes in sequencing technology, changes
in experimental design, and the ever-growing number of sequenced genomes. One of these systems is Glimmer,
a computational method for finding genes in bacteria, viruses, archaea, and simple eukaryotes. Glimmer is
highly accurate, finding over 99% of the genes in most prokaryotic genomes. It has been used by thousands of
scientists around the world and in the majority of published bacterial genome sequencing projects over the past
decade. Collectively the three main publications describing Glimmer have been cited over 4,700 times,
including >700 citations in 2016-17 alone. Usage of Glimmer has been increased in recent years due to the
explosion in next-generation sequencing projects, which are particularly cost-effective for bacterial genomes. A
second system, MUMmer, is an efficient whole-genome aligner that is used to compare genomes to one another
and to compare genome assemblies to detect changes, both large and small. MUMmer and its components,
especially Nucmer, have been widely used and incorporated in other systems, including multi-genome aligners
and several genome assembly packages. The three main publications describing MUMmer have been cited
over 3,600 times including >750 citations in 2016-17. In recent years we have focused our efforts on
developing methods for the analysis of metagenomics data, producing several newer tools, including Kraken
and Centrifuge. Both of these systems attempt to assign a species identifier to every read in a metagenomics
data set. Because the Kraken algorithm is not only accurate but far faster than earlier methods, it was rapidly
adopted by many labs soon after its release, and its usage continues to grow. The even newer and more space-
efficient Centrifuge system has also been highly successful and was recently incorporated into the analysis
package of one of the new third-generation sequencing companies. We continue to work on improving the
performance of both algorithms, and this project will allow us to extend them to handle the newest long-read
data that is increasingly being used for metagenomics experiments. Finally, a new direction of the lab is the use
of metagenomic shotgun sequencing to diagnose infections, for which we are not only modifying our
algorithms, but also building customized genome databases where we rigorously screen the genomes to identify
and remove contaminants and low-complexity sequences that create false positives. As we have done for many
years, we will release all of the software and data generated by this project for free under an open sou...

## Key facts

- **NIH application ID:** 9858369
- **Project number:** 5R35GM130151-02
- **Recipient organization:** JOHNS HOPKINS UNIVERSITY
- **Principal Investigator:** Steven L. Salzberg
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $403,409
- **Award type:** 5
- **Project period:** 2019-02-01 → 2024-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9858369

## Citation

> US National Institutes of Health, RePORTER application 9858369, Computational Methods for Microbial and Microbiome Sequence Analysis (5R35GM130151-02). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/9858369. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*