A unified probabilistic model and software implementation for analysis of nascent RNA sequencing data

NIH RePORTER · NIH · R01 · $594,501 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY The process by which RNA molecules are assembled from DNA templates, called transcription, is fundamental to all life and dysregulated in many human diseases. Over the past 15 years, studies of the mechanisms and dynamics of transcription have increasingly relied on a family of techniques for isolating and sequencing newly transcribed, or “nascent” RNAs. In contrast to standard RNA-seq, these nascent RNA sequencing (NRS) methods enable transcription to be measured separately from RNA degradation, respond rapidly to changes in transcription, and reveal the positions of RNA polymerases along a DNA template. However, NRS data require sophisticated computational and statistical methods for analysis, which are only beginning to emerge. Here, we propose to develop a powerful and flexible probabilistic modeling framework for the analysis of NRS data. Our framework is based on a highly general “unified model” that mathematically describes both the kinetics of transcription initiation, elongation, and promoter-proximal pause release, and the generation of sequencing read counts. It can be used to estimate transcriptional rates directly from NRS data, in either a steady-state or nonequilibrium setting. Our proposal includes three specific aims, focused on the development of (1) a series of statistical tests and machine-learning methods for differential analysis of transcription-associated rates; (2) a statistical and machine-learning framework and new experimental methods for characterizing variation in elongation rate and its dependency on genomic and epigenomic covariates; and (3) an open-source software package implementing these new methods in the R programming environment (STADyUM), integrated with the Bioconductor, AnVIL, and PyTorch environments. Successful completion of these aims will result in powerful, versatile, and highly accessible new computational tools that will accelerate progress in transcriptional research.

Key facts

NIH application ID
10801419
Project number
1R01HG012944-01A1
Recipient
COLD SPRING HARBOR LABORATORY
Principal Investigator
Adam Charles Siepel
Activity code
R01
Funding institute
NIH
Fiscal year
2024
Award amount
$594,501
Award type
1
Project period
2024-09-16 → 2028-07-31