# Comprehensive functional characterization and dissection of noncoding regulatory elements and human genetic variation

> **NIH NIH UM1** · BROAD INSTITUTE, INC. · 2020 · $1,496,387

## Abstract

Project Summary
The ENCODE project has generated comprehensive maps of cis-regulatory elements (CREs) controlling the
transcription of genes within the human genome. These maps have been crucial in our efforts to
understand sequence variants linked to human traits and disease, as the majority of these variants are non-
coding regulatory changes rather than amino acid substitutions. However, even though we know the locations
of thousands of CREs, our understanding of how they operate is derived from a relatively small set of well-
described examples. Therefore, we plan to directly characterize the function of ENCODE CREs at a genome-
wide scale in multiple cell-types. This will transition the field of functional genomics from a simple map of
regulatory elements towards a deep understanding of the fundamental rules governing regulatory logic down to
the basepair resolution. Achieving this will dramatically expand ENCODE's utility by strengthening our ability to
interpret the effects of natural human variation on gene regulation.
We propose to directly measure regulatory activity of over 3% of the genome, pursuing loci highlighted as
important by ENCODE and other functional data. We will first apply computational methods to identify the most
biologically informative CREs, representing a diversity of regulatory logic and architecture, and will use
machine learning techniques to prioritize functional variants for characterization relevant to common and rare
human diseases, traits, and adaptation. Of these we will select 200,000 CREs and 300,000 variants,
representing 100 Mb of genomic sequence, and characterize them using the massively parallel reporter assay
(MPRA) to understand each element's regulatory activity. Then, to complement data from the MPRA, we will
characterize additional 1 Mb regions across 10 loci using CRISPR-based non-coding screens to build a
comprehensive picture of these loci. This strategy leverages the throughput and flexibility of MPRA while
maintaining the connectivity of regulatory logic in the CRISPR-based screens, which perturb elements within
their endogenous genomic context. This will help us judge the accuracy and completeness of ENCODE, while
also providing data from both approaches to address a wide-variety of research questions. These methods are
difficult to apply to disease relevant primary cells at full scale, but we will use the results of our MPRA and
CRISPR screens to inform our models and better predict the fundamental rules of regulatory logic. We will then
construct smaller, targeted libraries to test disease-specific variants in primary cells and use assays specific for
each of three autoimmune diseases: type 1 diabetes, inflammatory bowel disease, and lupus.
This approach will inform the research community on the rules governing the activity of the CREs mapped by
the ENCODE project, and will simultaneously provide concrete information about the function of hundreds of
thousands of sequence variants relevant...

## Key facts

- **NIH application ID:** 9952404
- **Project number:** 5UM1HG009435-04
- **Recipient organization:** BROAD INSTITUTE, INC.
- **Principal Investigator:** Pardis Christine Sabeti
- **Activity code:** UM1 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $1,496,387
- **Award type:** 5
- **Project period:** 2017-09-12 → 2023-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9952404

## Citation

> US National Institutes of Health, RePORTER application 9952404, Comprehensive functional characterization and dissection of noncoding regulatory elements and human genetic variation (5UM1HG009435-04). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9952404. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
