# A blind source separation approach for deconvolution of bulk transcriptional data leads to early detection of ATF cell-states in complex bacterial populations, in vitro and in vivo

> **NIH NIH U19** · BROAD INSTITUTE, INC. · 2022 · $610,131

## Abstract

SUMMARY – PROJECT 3
Transient bacterial cell-states including tolerance, persistence and hetero-resistance (HR) are harbingers of
antibiotic treatment failure (ATF) and enablers of antibiotic resistance. Importantly, they are missed in any
currently employed diagnostic assay or antibiotic susceptibility tests. Intriguingly, in the treatment of different
types of cancer, physicians are often confronted with similar treatment failure issues. It turns out that these
epigenetic cell-states create extended opportunities for high-level resistance mutations to emerge. Moreover,
due to the phenotype’s transience, they themselves can directly drive the re-emergence of the (susceptible)
population after drug pressure subsides. While these cell-states are increasingly recognized as drivers that sit
at the root of treatment failure, new strategies are emerging to specifically identify, track and target them. To
achieve such highly targeted treatment, approaches are developed that map out the composition of complex
cancer tissue, for instance through single cell RNA-Seq (scRNA-Seq), or computational deconvolution of bulk
RNA-Seq data. While, scRNA-Seq on bacteria remains technically challenging we found that by modifying
existing tools, specific bacterial cell-states can be identified in complex bacterial populations. However, the
capabilities of current tools are limited, and through the implementation of state-of-the-art machine learning
algorithms there is much room for improvement. Moreover, ATF cell-states are poorly characterized, making it
currently impossible to effectively define them. Herein, 3 aims are pursued to develop an approach that, based
on bulk RNA-Seq data, dissects a complex bacterial population into its separate cell-states, and calculates their
frequencies and MICs. In Aim 1 a large and diverse temporal RNA-Seq dataset is generated by following a wide
variety of strains and species while they are exposed to antibiotics and a subset of the population switches to an
ATF cell state. In Aim 2 a blind source separation algorithm is explored to design a state-of-the-art machine
learning tool that deconvolves bulk RNA-Seq data from a complex bacterial population into the cell-states and
their frequencies that make up the population. Moreover, by reconstituting each cell-state’s expression profile
we enable transcriptional entropy calculations and thereby cell-state specific MIC predictions. In Aim 3 the
approach is validated by retrospectively predicting the presence of ATF cell-states in patient samples. Finally,
the model’s applicability is extended to bulk dual RNA-Seq data from host and bacterium, and validated on
patient serum samples. This project therefore not only informs on how ATF cell-states develop and are
maintained in a population, but also creates a path towards the development of diagnostics that can detect them
in an active infection. Combined with the collateral sensitivities from Project 2 this could eventually enable linkin...

## Key facts

- **NIH application ID:** 10171121
- **Project number:** 1U19AI158076-01
- **Recipient organization:** BROAD INSTITUTE, INC.
- **Principal Investigator:** Tim van Opijnen
- **Activity code:** U19 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $610,131
- **Award type:** 1
- **Project period:** 2022-09-12 → 2026-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10171121

## Citation

> US National Institutes of Health, RePORTER application 10171121, A blind source separation approach for deconvolution of bulk transcriptional data leads to early detection of ATF cell-states in complex bacterial populations, in vitro and in vivo (1U19AI158076-01). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10171121. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
