Large-scale annotation-free disease correlation analysis of the iHMP

NIH RePORTER · NIH · R03 · $304,918 · view on reporter.nih.gov ↗

Abstract

Project Summary We will work with the iHMP data resource to apply novel tools and data analysis methodologies to the challenge of disease association between large microbiome data sets, Inflammatory Bowel Disease, and the onset of diabetes. We will start with an annotation-free approach using k-mers to preprocess IBD and diabetes cohorts. We then will apply a novel scaling technology implemented in the sourmash software to reduce the data set size by a factor of 2000, rendering it tractable to machine learning approaches. We next will use random forests to determine a subset of predictive k-mers, and will measure their accuracy on validation data sets not used in the initial training. Finally, we will annotate the predictive k-mers using all available genome databases as well as a novel method to infer the metagenomic presence of accessory genomes of known genomes. Our outcomes will include a catalog of microbial genomes that correlate with IBD subtype and the onset of diabetes, as well as automated workflows to apply similar approaches to other data sets.

Key facts

NIH application ID: 10112077
Project number: 1R03OD030596-01
Recipient: UNIVERSITY OF CALIFORNIA AT DAVIS
Principal Investigator: C. Titus BROWN
Activity code: R03
Funding institute: NIH
Fiscal year: 2020
Award amount: $304,918
Award type: 1
Project period: 2020-09-15 → 2022-04-30