# Developing CRISPR repeats as a means of phylogenetically profiling metagenomic data

> **NIH NIH R21** · UNIV OF MARYLAND, COLLEGE PARK · 2022 · $111,291

## Abstract

Project Summary/ Abstract
Metagenomic sequencing provides a means of interrogating the genetic diversity and composi­ tion of a microbial environment. However, interpretation of these data remains a challenge with current methods for taxonomic profiling involving the 16S rRNA gene, k-mer sequence com­ position, and nominally single-copy marker genes. None of these techniques is perfect. We propose to develop a new computational method that uses information inherent in the CRISPR­ Cas prokaryotic immune system: namely, the repeat sequences in each CRISPR array evolve over time and thus contain phylogenetic information. While this technique will only be applica­ ble to CRISPR-containing microbes, CRISPR can be found in >40% of completely sequenced prokaryotic genomes and has the potential to reveal fine-scale composition differences within that subset. We aim to: 1) evaluate the extent to which CRISPR repeat diversity and taxonomy covary at different taxonomic ranks; 2) build a probabilistic mixture model to infer the most likely community profile from a set of CRISPR repeats; and 3) compare the predictive performance of our inference method to existing methods on both simulated metagenomic data and actual environmental samples.

## Key facts

- **NIH application ID:** 10511355
- **Project number:** 1R21GM147759-01
- **Recipient organization:** UNIV OF MARYLAND, COLLEGE PARK
- **Principal Investigator:** Philip Lee Falk Johnson
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $111,291
- **Award type:** 1
- **Project period:** 2022-09-15 → 2024-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10511355

## Citation

> US National Institutes of Health, RePORTER application 10511355, Developing CRISPR repeats as a means of phylogenetically profiling metagenomic data (1R21GM147759-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10511355. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
