# Novel statistical methods for controlled variable selection of microbiome data

> **NIH NIH R21** · PENNSYLVANIA STATE UNIV HERSHEY MED CTR · 2021 · $225,714

## Abstract

Project Summary/Abstract
The scientific community is increasingly appreciative of the important role that the microbiome community
plays in many diseases and health conditions. The structure of the microbiome community (e.g., relative
abundances of different taxa and microbial network/interactions) is subject to change in response to many
environment and host factors. Scientific investigation of how microbiome interact with each other, with their
environment and with their host can shed light on our understanding of the underlying biological mechanism of
microbiome-related disease and health conditions. Despite the incredible amount of research interest and
availability of massive data through the innovative use of cutting-edge techniques (16S rRNA gene
sequencing, shotgun metagenomics sequencing and metabolomics), there are still insufficient statistical tools
that can fully handle the complexity of microbiome data, including the high-dimensionality, phylogenetic
relatedness, relatively small sample size, compositional constraint and others. The main goal of this proposal is
to develop statistically powerful and computationally efficient methods to address these challenges in analyzing
microbiome data. In particular, this research will be applied to high-throughput microbiome data and lead to
new statistical controlled variable selection methods that 1) select a subgroup of taxa that are genuinely
associated with disease-related outcomes under a pre-specified false discovery rate (FDR), where the
outcomes can be either a single disease outcome of interest or multivariate such as multiple secondary
phenotypes related to the disease; and b) identify taxa and taxa-metabolite interactions that are associated
with a disease outcome under a certain FDR threshold. Our proposed methods are innovative in that it can
both select important taxa features or taxa-metabolites interactions and have the FDR being controlled, which
largely enhances the reproducibility and reliability of the discovery results in microbiome association studies.
The enhanced taxa selection would further facilitate downstream laboratory-based functional studies,
eventually leading to potential improvements in prevention, detection, treatment and monitoring of many health
and disease conditions from a microbiome's perspective. Completion of this proposal will also help bridging the
gap between the burgeoning research interest in microbiome studies and the lack of analytical tools. In
addition to publication in peer-reviewed journals, we will make our results disseminated through conferences
and open-source software that is freely available to the wider scientific community. The proposed methods are
essential for improved understanding of microbiome mechanism along with its interaction with host genome or
metabolome in the pathology of certain diseases, which are of central importance to human health.

## Key facts

- **NIH application ID:** 10116262
- **Project number:** 5R21AI144765-02
- **Recipient organization:** PENNSYLVANIA STATE UNIV HERSHEY MED CTR
- **Principal Investigator:** Vernon M Chinchilli
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $225,714
- **Award type:** 5
- **Project period:** 2020-03-01 → 2023-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10116262

## Citation

> US National Institutes of Health, RePORTER application 10116262, Novel statistical methods for controlled variable selection of microbiome data (5R21AI144765-02). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10116262. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*