Inflammatory bowel disease (IBD) affects over 1.2 million patients in the United States and causes significant morbidity and healthcare expenditures. Studies have associated the development of IBD with changes in the human gut microbiome, which together with genetic and environmental factors alter immune responses to gut flora and cause chronic inflammation. The surface and secreted proteins and glycans of gut microbes mediates many aspects of these immune interactions. However, the study of these molecules is limited by the extreme complexity of the gut environment. Standard proteomic techniques only capture a small fraction of the predicted microbial gene products while metagenomic analyses using automated annotations fail to identify functions for nearly half of all predicted proteins. The dietary, host, and microbial contributions to the diverse carbohydrate pool also makes the analysis of microbial glycans in stool samples highly challenging. The incomplete evaluation of microbial surface and secreted proteins and microbial glycans impedes the discovery of new biological insights into IBD. Machine learning algorithms, and especially advancements in natural language processing (NLP) based on deep neural networks, have enabled major improvements in the accuracy of a number of tasks related to human speech and written text. These deep neural networks function by analyzing massive collections of texts and then creating high-dimensional vectors to represent the semantic meaning of words without the need for specific labels. Biological polymers such as DNA, proteins, and glycans are also long complex sequences, and application of NLP techniques enabled accurate predictions of the functional and structural characteristics of proteins and glycans from primary sequence. We hypothesize that machine learning methods incorporating deep neural networks can be successfully applied to the analysis of microbial metagenomes and glycomes to identify previously unknown perturbations in IBD. We will test this hypothesis with the following aims: 1) Develop and adapt deep learning algorithms to analyze the surface-associated and secreted gut microbial metaproteome in IBD; 2) Create and apply deep learning algorithms to analyze the fecal microbial glycome in IBD; 3) Experimentally validate the functions of a subset of novel microbial proteins and glycans that are altered in IBD. The long-term goal of this project is to discover new biological insights into the pathogenesis, progression, and treatment of IBD. This proposal comprises a five-year research career development program focused on the creation and adaptation of deep learning algorithms to the analysis of gut microbial metagenomic and glycomic data. The candidate is an Instructor of Medicine at Harvard Medical School and the Division of Gastroenterology at Massachusetts General Hospital. He has assembled an outstanding group of collaborators and advisors with deep expertise in machine learning, glycobiolog...