Computational framework for analyzing and annotating single bacterium RNA-Seq data

NIH RePORTER · NIH · R21 · $254,250 · view on reporter.nih.gov ↗

Abstract

SUMMARY Pathogenesis of infectious bacterial disease relies on the ability of bacteria to deploy gene expression programs rapidly and flexibly to adapt to new environments. Heterogeneous expression of these genetic programs is a common strategy invoked by bacterial populations, particularly in the expression of virulence factors. To study heterogeneous gene expression, single-cell technologies were designed for use in eukaryotes and have been extremely impactful in that setting, but use of these technologies has been challenging in bacteria. Recent advances have led to significant improvement in single-bacterium RNA-Seq, making it possible to now measure gene expression in hundreds of thousands of cells in a single experiment. However, analyzing such datasets is challenged by the extremely low number of detected transcripts in each cell, for technical and biological reasons. The study of bacterial pathogenesis thus requires new computational and conceptual frameworks to enable analysis of single-bacterium RNA-Seq datasets. Here, we propose a set of innovative tools that exploit unique aspects of bacterial physiology to address the challenging features of these data. In our first Aim, we propose the first RNA-Seq denoising approach specifically tailored for single-bacterium data. This approach makes use of the power of high cell numbers to identify modules of co-varying gene expression profiles and then uses the latent space for cells derived from the module expression to smooth over neighboring cells for more highly- resolved transcriptome profiles. In our second Aim, we seek to annotate cells according to their replicative and growth rates by integrating parameters directly derived from bacterial cell biology. Specifically, we will take advantage of the general tendency of a bacterium’s single origin of replication to relate higher expression of genes closer to the origin of replication to a replicating genome. In addition, we will infer a cell’s growth rate on the basis of the principle that rapidly growing cells have a higher abundance of pre-noncoding RNA relative to their mature counterparts. In our final Aim we will release and maintain a computational package with analysis tools for use by the pathogenicity community. The single-bacterium RNA-Seq field is fast-growing and requires computational support that will enable progress in elucidating the mechanisms of antibiotic tolerance and bacterial pathogenesis.

Key facts

NIH application ID
10444669
Project number
1R21AI169350-01
Recipient
NEW YORK UNIVERSITY SCHOOL OF MEDICINE
Principal Investigator
ITAI YANAI
Activity code
R21
Funding institute
NIH
Fiscal year
2022
Award amount
$254,250
Award type
1
Project period
2022-04-08 → 2024-03-31