# Decoding the regulatory architecture of the human genome across cell types, individuals and disease

> **NIH NIH U01** · STANFORD UNIVERSITY · 2020 · $620,335

## Abstract

PROJECT DESCRIPTION
While accurate annotations of protein-coding regions in the human genome have been available for
many years, annotation and interpretation of regulatory sequences has lagged far behind. This is
because—in contrast to protein-coding sequences—the “rules” that govern links from genome
sequence to regulatory function are fuzzy, complex, and highly context-specific. Our limited
understanding of regulatory regions presents a fundamental challenge for the identification and
interpretation of disease variation, especially in the context of personal genome interpretation. Work
from ENCODE and other groups has started to close this gap through experimental work, including
high-resolution maps of regulatory sites in a variety of cell types, and modeling of the cell-type
specific mappings from genome sequence to regulatory function.
In this project we will develop a suite of new tools that uses these diverse new data sets to tackle
these problems. We will implement and apply powerful new machine learning methods (based on
deep learning) to interpret the genomic, context-specific encoding of regulatory information, and to
identify genetic variants that impact the encoded information. We will build models using data from a
variety of sources including ENCODE, Roadmap Epigenomics, GTEx, regulatory variation in the
HapMap cell lines, as well as from disease cohorts. Validation experiments will be performed using a
new high-complexity CRISPR/Cas9 system developed by our team. We will develop software tools
and analytical results that can be widely used for genome interpretation, especially in analysis of
personal genomes. By the end of this study we expect to have: (1) developed powerful new
computational models for predicting regulatory function in a wide variety of cell types, at
unprecedented resolution; (2) implemented novel validation screens in native chromatin at extremely
high throughput; and (3) developed new tools for interpreting common and rare regulatory variation,
with particular focus on identification of high-impact regulatory mutations in personal genomes. We
are committed to timely release of software, data and analysis and are committed to working with the
ENCODE Consortium to increase the impact of data and analyses from all study sites.

## Key facts

- **NIH application ID:** 9851890
- **Project number:** 5U01HG009431-04
- **Recipient organization:** STANFORD UNIVERSITY
- **Principal Investigator:** JONATHAN K PRITCHARD
- **Activity code:** U01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $620,335
- **Award type:** 5
- **Project period:** 2017-02-01 → 2022-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9851890

## Citation

> US National Institutes of Health, RePORTER application 9851890, Decoding the regulatory architecture of the human genome across cell types, individuals and disease (5U01HG009431-04). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9851890. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
