# Computational framework for analyzing and annotating single bacterium RNA-Seq data

> **NIH NIH R21** · NEW YORK UNIVERSITY SCHOOL OF MEDICINE · 2022 · $254,250

## Abstract

SUMMARY
Pathogenesis of infectious bacterial disease relies on the ability of bacteria to deploy gene expression programs
rapidly and flexibly to adapt to new environments. Heterogeneous expression of these genetic programs is a
common strategy invoked by bacterial populations, particularly in the expression of virulence factors. To study
heterogeneous gene expression, single-cell technologies were designed for use in eukaryotes and have been
extremely impactful in that setting, but use of these technologies has been challenging in bacteria. Recent
advances have led to significant improvement in single-bacterium RNA-Seq, making it possible to now measure
gene expression in hundreds of thousands of cells in a single experiment. However, analyzing such datasets is
challenged by the extremely low number of detected transcripts in each cell, for technical and biological reasons.
The study of bacterial pathogenesis thus requires new computational and conceptual frameworks to enable
analysis of single-bacterium RNA-Seq datasets. Here, we propose a set of innovative tools that exploit unique
aspects of bacterial physiology to address the challenging features of these data. In our first Aim, we propose
the first RNA-Seq denoising approach specifically tailored for single-bacterium data. This approach makes use
of the power of high cell numbers to identify modules of co-varying gene expression profiles and then uses the
latent space for cells derived from the module expression to smooth over neighboring cells for more highly-
resolved transcriptome profiles. In our second Aim, we seek to annotate cells according to their replicative and
growth rates by integrating parameters directly derived from bacterial cell biology. Specifically, we will take
advantage of the general tendency of a bacterium’s single origin of replication to relate higher expression of
genes closer to the origin of replication to a replicating genome. In addition, we will infer a cell’s growth rate on
the basis of the principle that rapidly growing cells have a higher abundance of pre-noncoding RNA relative to
their mature counterparts. In our final Aim we will release and maintain a computational package with analysis
tools for use by the pathogenicity community. The single-bacterium RNA-Seq field is fast-growing and requires
computational support that will enable progress in elucidating the mechanisms of antibiotic tolerance and
bacterial pathogenesis.

## Key facts

- **NIH application ID:** 10444669
- **Project number:** 1R21AI169350-01
- **Recipient organization:** NEW YORK UNIVERSITY SCHOOL OF MEDICINE
- **Principal Investigator:** ITAI YANAI
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $254,250
- **Award type:** 1
- **Project period:** 2022-04-08 → 2024-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10444669

## Citation

> US National Institutes of Health, RePORTER application 10444669, Computational framework for analyzing and annotating single bacterium RNA-Seq data (1R21AI169350-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10444669. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*