# Statistical methods to enhance reproducible microbiome discovery

> **NIH NIH R35** · UNIVERSITY OF WASHINGTON · 2022 · $306,265

## Abstract

PROJECT SUMMARY
The microbiome, which plays an important role in human health and disease, is generally
characterized using high throughput genome sequencing. However, the laboratory processes
required for microbial metagenomic sequencing can introduce spurious measurement noise due
to, for example, DNA extraction, amplification, sequencing depth, GC bias, batch effects,
laboratory protocols, and bioinformatics processing. Without correction, the magnitude of
sample- and study- specific variation can easily exceed the magnitude of variation due to
treatment or disease status. Therefore, diagnosis and treatment of diseases and infections
based on microbial sequencing is impeded by spurious noise that masks true biological signal.
The overall goals of this research are to develop new statistical methods for the analysis of
microbiome data, including taxonomic, functional, and metabolic data. Our statistical models will
explicitly model batch and technical variation, allowing us to distinguish, rather than conflate,
biological signal and non-biological noise. Our new models will leverage commonly-collected
sequence data, such as positive controls and technical replicates, which are not typically utilized
by researchers in their statistical analysis of microbiome data. By designing statistical methods
that use existing data sources, we will reduce the amount and cost of sequencing required to
detect true biological signals. Our models will allow us to perform hypothesis testing for
differential abundance of microbial genes, strains, and metabolites, as well as shifts in the
diversity of microbial communities, without discarding biological signal or detecting spurious
technical noise due to imperfect laboratory protocols and instrumentation. The methods are
applicable to a broad range of experimental designs (including observational and longitudinal),
biomedical research methods (including model systems and clinical trials), and sequencing
platforms (including marker gene and whole genome sequencing as well as spectrometric
methods for metabolic and proteomic profiling). Our statistical methods will be distributed as
freely available, open-source software, which will include extensive tutorials, and forums for
user questions. By avoiding detection of signals due to sample- and study-!specific artefacts,
our methods will increase the reproducibility of microbiome research, and facilitate the
identification of therapeutic and diagnostic opportunities in microbiome science.

## Key facts

- **NIH application ID:** 10439786
- **Project number:** 5R35GM133420-04
- **Recipient organization:** UNIVERSITY OF WASHINGTON
- **Principal Investigator:** Amy D Willis
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $306,265
- **Award type:** 5
- **Project period:** 2019-09-01 → 2024-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10439786

## Citation

> US National Institutes of Health, RePORTER application 10439786, Statistical methods to enhance reproducible microbiome discovery (5R35GM133420-04). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10439786. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
