Mechanisms of Transcriptional Control Revealed by Nascent Transcript Sequencing

NIH RePORTER · NIH · R01 · $513,500 · view on reporter.nih.gov ↗

Abstract

Large consortium efforts have collected hundreds of genome-wide datasets that have delineated myriad regulatory regions, transcription factor binding sites and large numbers of coding and non-coding transcripts. Even with this massive amount of data, it remains a significant challenge to determine how the mapped elements function together in regulatory networks. This is due in large part to our inability to accurately and quantitatively detect all forms of nascent transcription, the instantaneous output of transcriptional regulation. Moreover, our understanding of global gene regulation is restricted by a lack of computational tools that seamlessly integrate genome-wide datasets. The overall goal of this proposal is to maximize the impact of nascent transcriptome studies and enable facile integration with other functional genomic data. My group developed native elongating transcript sequencing (NET-seq), that enables the strand-specific nucleotide-resolution mapping of RNA polymerase density, highlighting all transcriptional activity regardless of transcript half-lives and revealing precise positions of Pol II pausing where regulatory control is applied. Here, we will develop a new version of NET-seq – NET-seq 2.0 – that enables the routine, scalable and flexible application to diverse human cell types (or any eukaryotic system). Moreover, we will increase the potential of NET-seq analysis by developing two innovative bioinformatics strategies to seamlessly integrate NET-seq data with other genome-wide datasets that will have applications beyond NET-seq studies. To demonstrate the broad utility of our integrated approach, we will study regulatory networks and cell differentiation for which instantaneous nascent transcriptional analysis will be highly impactful. In Aim 1, our goal is to make NET-seq easier, cheaper, and more flexible. Our improvements will reduce background and increase usable reads, dramatically reduce cell input requirements (100-1000-fold), enable dense, region-specific RNA transcription analyses, and enable quantitative comparisons between samples and conditions. In Aim 2, we will determine transcription kinetics through integrating NET-seq with metabolic RNA labeling (TT-seq) data which report local synthesis rates. This integrative approach yields a rich transcriptional phenotype that we will use to develop gene regulatory network models. In Aim 3, we will create new computational algorithms that circumvent the need to determine each molecular event separately, and instead infer the status of unmapped events using information-rich datasets, such as NET-seq. We will use integrative deep neural networks (`deep-learning') that use available genome-wide datasets to predict unavailable datasets from data already on hand. We will apply this approach to study erythropoiesis using a well- defined primary human hematopoietic differentiation system by a time series NET-seq and DNase-seq analysis. These data will inform deep neural net...

Key facts

NIH application ID
10171878
Project number
5R01HG007173-09
Recipient
HARVARD MEDICAL SCHOOL
Principal Investigator
Lee Stirling Churchman
Activity code
R01
Funding institute
NIH
Fiscal year
2021
Award amount
$513,500
Award type
5
Project period
2013-04-01 → 2023-01-31