# Machine learning-based methods for the analysis of microbial glycomes and proteomes in inflammatory bowel disease.

> **NIH NIH K08** · MASSACHUSETTS GENERAL HOSPITAL · 2024 · $169,560

## Abstract

Inflammatory bowel disease (IBD) affects over 1.2 million patients in the United States and causes significant
morbidity and healthcare expenditures. Studies have associated the development of IBD with changes in the
human gut microbiome, which together with genetic and environmental factors alter immune responses to gut
flora and cause chronic inflammation. The surface and secreted proteins and glycans of gut microbes mediates
many aspects of these immune interactions. However, the study of these molecules is limited by the extreme
complexity of the gut environment. Standard proteomic techniques only capture a small fraction of the predicted
microbial gene products while metagenomic analyses using automated annotations fail to identify functions for
nearly half of all predicted proteins. The dietary, host, and microbial contributions to the diverse carbohydrate
pool also makes the analysis of microbial glycans in stool samples highly challenging. The incomplete evaluation
of microbial surface and secreted proteins and microbial glycans impedes the discovery of new biological insights
into IBD. Machine learning algorithms, and especially advancements in natural language processing (NLP)
based on deep neural networks, have enabled major improvements in the accuracy of a number of tasks related
to human speech and written text. These deep neural networks function by analyzing massive collections of texts
and then creating high-dimensional vectors to represent the semantic meaning of words without the need for
specific labels. Biological polymers such as DNA, proteins, and glycans are also long complex sequences, and
application of NLP techniques enabled accurate predictions of the functional and structural characteristics of
proteins and glycans from primary sequence. We hypothesize that machine learning methods incorporating deep
neural networks can be successfully applied to the analysis of microbial metagenomes and glycomes to identify
previously unknown perturbations in IBD. We will test this hypothesis with the following aims: 1) Develop and
adapt deep learning algorithms to analyze the surface-associated and secreted gut microbial metaproteome in
IBD; 2) Create and apply deep learning algorithms to analyze the fecal microbial glycome in IBD; 3)
Experimentally validate the functions of a subset of novel microbial proteins and glycans that are altered in IBD.
The long-term goal of this project is to discover new biological insights into the pathogenesis, progression, and
treatment of IBD. This proposal comprises a five-year research career development program focused on the
creation and adaptation of deep learning algorithms to the analysis of gut microbial metagenomic and glycomic
data. The candidate is an Instructor of Medicine at Harvard Medical School and the Division of Gastroenterology
at Massachusetts General Hospital. He has assembled an outstanding group of collaborators and advisors with
deep expertise in machine learning, glycobiolog...

## Key facts

- **NIH application ID:** 10759400
- **Project number:** 5K08DK132516-02
- **Recipient organization:** MASSACHUSETTS GENERAL HOSPITAL
- **Principal Investigator:** Xiao Tan
- **Activity code:** K08 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $169,560
- **Award type:** 5
- **Project period:** 2023-01-04 → 2027-11-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10759400

## Citation

> US National Institutes of Health, RePORTER application 10759400, Machine learning-based methods for the analysis of microbial glycomes and proteomes in inflammatory bowel disease. (5K08DK132516-02). Retrieved via AI Analytics 2026-06-25 from https://api.ai-analytics.org/grant/nih/10759400. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
