# Statistical Methods for Microbiome and Metagenomics

> **NIH NIH R01** · UNIVERSITY OF PENNSYLVANIA · 2020 · $460,844

## Abstract

Abstract
 The broad, long-term objective of this project concerns the development of novel statistical methods and computational tools for statistical and probabilistic modeling of human microbiome and shotgun metagenomic data
motivated by important biological questions and experiments. The specific aim of the current project is to develop
new statistical models, novel inference procedures, and fast computational algorithms for the analysis of 16S rRNA
and shotgun metagenomic sequencing data in large-scale human microbiome studies. The project focuses on the
development of model-based multi-sample approaches for quantifying microbiome compositions and development
methods of compositional mediation analysis in order to quantify the effects of microbiome mediating the effect
of treatment/risk factor on outcomes. In addition, this project will also develop novel methods for statistical
inference including large-scale multiple testing procedures on sparse discrete Markov random field (MRF) models
for microbial interaction network construction and for differential network analysis. These problems are all motivated by the PI's close collaborations with Penn investigators on metagenomic studies of Crohn disease, childhood
obesity and disease progression among patients with chronic kidney disease (CKD)). The methods hinge on novel
integration of biological insights and methods for modeling sparse count data, high dimensional compositional
data analysis and network-based analysis, including nuclear-norm penalized maximum likelihood estimation for
tax abundance estimation, compositional mediation model and Markov random field based microbial network and
differential network analysis. The new methods can be applied to both 16S rRNA and shotgun metagenomic sequencing data and will ideally facilitate the identifications of microbial composition, subcomposition and microbial
networks underlying various complex human diseases and biological processes. The project will also investigate
the robustness, power and efficiencies of these methods and compare them with existing methods. In addition,
this project will develop practical and feasible computer programs for the implementation of the proposed methods, and for the evaluation of the performance of these methods through extensive simulations and analysis of
various on-going microbiome studies through the PI's collaborations with Penn physicians and biologists. The
work proposed here will contribute statistical methodology for modeling metagenomic sequencing data and high
dimensional compositional data, theoretical inference methods for the MFR models and offer insights into each of
the biological areas represented by the various data sets. All programs developed under this grant and detailed
documentation will be made available free-of-charge to interested researchers.

## Key facts

- **NIH application ID:** 9983111
- **Project number:** 5R01GM123056-04
- **Recipient organization:** UNIVERSITY OF PENNSYLVANIA
- **Principal Investigator:** Hongzhe Lee
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $460,844
- **Award type:** 5
- **Project period:** 2017-09-15 → 2022-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9983111

## Citation

> US National Institutes of Health, RePORTER application 9983111, Statistical Methods for Microbiome and Metagenomics (5R01GM123056-04). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/9983111. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
