# Statistical methods for analyzing messy microbiome data: detection of hidden artifacts and robust modeling approaches

> **NIH NIH R01** · JOHNS HOPKINS UNIVERSITY · 2022 · $380,193

## Abstract

Project Abstract:
 Recent research has highlighted the importance of human associated microbiota in many diseases and health
conditions. Nowadays marker-gene amplicon and shotgun metagenomics sequencing (jointly, MGS) have been
routinely used in epidemiological and clinical studies to investigate the health impact of the microbiome commu-
nity. In the public domain, many researchers now deposit MGS data together with other data for other researchers
to investigate. Despite being increasingly available, MGS data analysis remains difﬁcult. In addition to the classic
statistical challenges inherent to MGS data such as the compositionality, the sparsity, the over dispersion and the
phylogenetic relationship between taxa, large scale MGS studies feature additional complications including the
experimental bias and hidden artifacts (batch effects), which will invalidate downstream analysis if not accounted
for properly. Current analytic approaches largely ignore or insufﬁciently handle these difﬁculties.
 This proposal aims to develop powerful and robust statistical methods for reproducible microbiome discoveries
that adjust for unknown batch effects and are resistant to sequencing biases. In aim 1, we will develop a novel
approach to search for unmeasured artifacts through a novel surrogate variable analysis and multiple quantile
thresholding. Our approach advances the existing surrogate variable analysis approach to speciﬁcally address
the characteristics of MGS data including the differences in variabilities, the sparsity and the zero inﬂation. In
aims 2 & 3, we develop bias resistant modeling for assessing microbiome-phenotype association and community
level analysis. We will also develop, distribute and support user-friendly software for the proposed methods to
beneﬁt the entire research community. The proposed methods will be evaluated against extensive simulations
and analysis of real microbiome data including data from our motivating studies as in VAPing Observational
Research Study (VAPORS) and the New Hampshire birth cohort study. Successful completion of this proposal
will ﬁll the gap between the increasing research interest in microbiome and the lack of robust and bias-resistant
tools, and facilitate our in-depth understanding of human microbiome in health and disease.

## Key facts

- **NIH application ID:** 10503637
- **Project number:** 1R01GM147162-01
- **Recipient organization:** JOHNS HOPKINS UNIVERSITY
- **Principal Investigator:** Ni Zhao
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $380,193
- **Award type:** 1
- **Project period:** 2022-09-23 → 2027-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10503637

## Citation

> US National Institutes of Health, RePORTER application 10503637, Statistical methods for analyzing messy microbiome data: detection of hidden artifacts and robust modeling approaches (1R01GM147162-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10503637. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
