Extensive multiplexing of protein nucleic-acid interactions to comprehensively study gene expression regulation from chromatin to mRNA degradation

NIH RePORTER · NIH · R01 · $738,499 · view on reporter.nih.gov ↗

Abstract

Project Summary Gene expression is tightly controlled at both the RNA and protein level by mechanisms involving chromatin modification, transcriptional regulation, mRNA splicing, processing, translation and degradation. Each of these processes are regulated by nucleic acid-protein interactions (DNA-protein and RNA-protein). Accordingly, there have been tremendous efforts in the scientific community to comprehensively map these interactions, including major international research efforts (e.g. ENCODE, RoadMap Epigenomics) focused on generating reference maps for specific cell types. However, because these binding maps are highly specific for individual cell types, there is a critical need to enable the generation of comprehensive genomic maps for any cell type of interest – including primary cell types, disease models, or other rare cell populations – within an individual lab. This goal remains challenging because existing assays can only map interactions of a single protein at a time and are therefore prohibitively expensive. To address these issues, this proposal will develop a highly innovative technology based on our split-pool barcoding strategy (SPRITE) that maps multiway protein-nucleic acid interactions using high throughput sequencing. The proposed Hi-P technology will be used to establish: (i) a highly multiplexed eCLIP-seq method to map up to hundreds of RNA binding proteins simultaneously to their RNA binding sites, (ii) a highly multiplexed ChIP-seq method to map up to hundreds of DNA binding proteins and histone modifications to their DNA binding sites, and (iii) methods to map these multiple protein-nucleic acid interactions across many samples, among these rare cell types, simultaneously. The proposed technology represents a major advance – it will dramatically increase the scale of existing methods and create new capabilities that are currently not possible. These tools will empower individual researchers to generate detailed genomic datasets in specific biological and disease contexts that are comparable in size and complexity to those generated by the ENCODE project at a tiny fraction of its cost. More generally, we anticipate that these tools will lead to critical new insights into gene regulation and human disease.

Key facts

NIH application ID: 10344678
Project number: 1R01HG012216-01
Recipient: COLUMBIA UNIV NEW YORK MORNINGSIDE
Principal Investigator: Mitchell Guttman
Activity code: R01
Funding institute: NIH
Fiscal year: 2022
Award amount: $738,499
Award type: 1
Project period: 2022-02-01 → 2026-01-31