# A Computational Framework for Protein Identification and Quantification in Metaproteomics Using Data-Independent Acquisition

> **NIH NIH R15** · UNIVERSITY OF NORTH TEXAS · 2020 · $361,302

## Abstract

Summary
Extensive efforts to characterize the human microbiome have tremendously increased the knowledge about the
diversity of the microbiome and about its composition in health and in disease. Dysbiosis in human microbiota
underlies the development of many diseases, such as obesity, diabetes, and inflammatory bowel disease.
Metaproteomics based on mass spectrometry (MS) has become widely used in microbiome research for gaining
insights into the functional states of microbial communities. Mass spectrometry with data-dependent acquisition
(DDA) is the most common method of choice for identifying and quantifying microbial proteins in metaproteomics,
but this technique is fundamentally limited in terms of reproducibility and comprehensiveness. Proteomics using
data-independent acquisition (DIA) can, in theory, resolve the fundamental problems associated with the DDA
method. However, the lack of bioinformatics tools still presents unresolved challenges in the context of DIA, and
only few DIA applications on microbiome or host-microbe interactions have been reported.
MS-based metaproteomics is a challenging measurement due to the high complexity with thousands of species
at vastly different abundances. To obtain a comprehensive characterization of the functional state of microbial
communities requires considering proteins not just from dominant microorganisms but also low-abundance
microorganisms. This proposal addresses the need for identifying and quantifying proteins through the
availability of a set of computational tools that use DIA data to identify and quantify peptides and their variants
at the microbial strain level. The false peptide identifications are controlled by newly proposed methods for false
discovery rate assessment at multiple granularities. The protein inference and quantification are optimized by
linear programming models that contain information from genome/transcriptome sequencing data and
metaproteome sample replicas. The improvement will increase the number of identified protein variants,
especially those from the low-abundance microorganisms, which can help accurately characterize the functional
composition in microbial communities and reveal the functional redundancy.

## Key facts

- **NIH application ID:** 10047086
- **Project number:** 1R15LM013460-01
- **Recipient organization:** UNIVERSITY OF NORTH TEXAS
- **Principal Investigator:** Xuan Guo
- **Activity code:** R15 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $361,302
- **Award type:** 1
- **Project period:** 2020-07-01 → 2024-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10047086

## Citation

> US National Institutes of Health, RePORTER application 10047086, A Computational Framework for Protein Identification and Quantification in Metaproteomics Using Data-Independent Acquisition (1R15LM013460-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10047086. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*