Statistical methods for analyzing messy microbiome data: detection of hidden artifacts and robust modeling approaches

NIH RePORTER · NIH · R01 · $364,255 · view on reporter.nih.gov ↗

Abstract

Project Abstract: Recent research has highlighted the importance of human associated microbiota in many diseases and health conditions. Nowadays marker-gene amplicon and shotgun metagenomics sequencing (jointly, MGS) have been routinely used in epidemiological and clinical studies to investigate the health impact of the microbiome commu- nity. In the public domain, many researchers now deposit MGS data together with other data for other researchers to investigate. Despite being increasingly available, MGS data analysis remains difficult. In addition to the classic statistical challenges inherent to MGS data such as the compositionality, the sparsity, the over dispersion and the phylogenetic relationship between taxa, large scale MGS studies feature additional complications including the experimental bias and hidden artifacts (batch effects), which will invalidate downstream analysis if not accounted for properly. Current analytic approaches largely ignore or insufficiently handle these difficulties. This proposal aims to develop powerful and robust statistical methods for reproducible microbiome discoveries that adjust for unknown batch effects and are resistant to sequencing biases. In aim 1, we will develop a novel approach to search for unmeasured artifacts through a novel surrogate variable analysis and multiple quantile thresholding. Our approach advances the existing surrogate variable analysis approach to specifically address the characteristics of MGS data including the differences in variabilities, the sparsity and the zero inflation. In aims 2 & 3, we develop bias resistant modeling for assessing microbiome-phenotype association and community level analysis. We will also develop, distribute and support user-friendly software for the proposed methods to benefit the entire research community. The proposed methods will be evaluated against extensive simulations and analysis of real microbiome data including data from our motivating studies as in VAPing Observational Research Study (VAPORS) and the New Hampshire birth cohort study. Successful completion of this proposal will fill the gap between the increasing research interest in microbiome and the lack of robust and bias-resistant tools, and facilitate our in-depth understanding of human microbiome in health and disease.

Key facts

NIH application ID
10914026
Project number
5R01GM147162-03
Recipient
JOHNS HOPKINS UNIVERSITY
Principal Investigator
Ni Zhao
Activity code
R01
Funding institute
NIH
Fiscal year
2024
Award amount
$364,255
Award type
5
Project period
2022-09-23 → 2027-08-31