Developing CRISPR repeats as a means of phylogenetically profiling metagenomic data

NIH RePORTER · NIH · R21 · $111,291 · view on reporter.nih.gov ↗

Abstract

Project Summary/ Abstract Metagenomic sequencing provides a means of interrogating the genetic diversity and composi­ tion of a microbial environment. However, interpretation of these data remains a challenge with current methods for taxonomic profiling involving the 16S rRNA gene, k-mer sequence com­ position, and nominally single-copy marker genes. None of these techniques is perfect. We propose to develop a new computational method that uses information inherent in the CRISPR­ Cas prokaryotic immune system: namely, the repeat sequences in each CRISPR array evolve over time and thus contain phylogenetic information. While this technique will only be applica­ ble to CRISPR-containing microbes, CRISPR can be found in >40% of completely sequenced prokaryotic genomes and has the potential to reveal fine-scale composition differences within that subset. We aim to: 1) evaluate the extent to which CRISPR repeat diversity and taxonomy covary at different taxonomic ranks; 2) build a probabilistic mixture model to infer the most likely community profile from a set of CRISPR repeats; and 3) compare the predictive performance of our inference method to existing methods on both simulated metagenomic data and actual environmental samples.

Key facts

NIH application ID
10511355
Project number
1R21GM147759-01
Recipient
UNIV OF MARYLAND, COLLEGE PARK
Principal Investigator
Philip Lee Falk Johnson
Activity code
R21
Funding institute
NIH
Fiscal year
2022
Award amount
$111,291
Award type
1
Project period
2022-09-15 → 2024-08-31