# Elucidation of the organizing principles of the regulatory genome through large-scale data integration

> **NIH NIH R35** · ALTIUS INSTITUTE FOR BIOMEDICAL SCIENCES · 2020 · $369,847

## Abstract

PROJECT SUMMARY
The human genome contains the structural and operational instructions for living cells, yet exactly what these
instructions are and how they are utilized and encoded in the primary genomic sequence is poorly understood.
Arguably the only well-understood portions of the genome are protein-coding regions, which make up less than
2% of the genome. It has become increasingly clear that the non-coding genome encodes vast numbers of
regulatory elements important for controlling gene expression levels in a cell type specific manner. Moreover,
the overwhelming majority of disease- and trait-associated variants identified by genome-wide association
studies (GWAS) lie in non-coding regions of the genome, and are strongly enriched in regulatory elements.
Despite this clear relevance, we still lack a complete understanding of the global organizing principles of the
regulatory genome, such as how regulatory elements are distributed across the genome, what their occurrence
patterns are across cell types, and how they are encoded in the genomic sequence. We hypothesize that the
main reason for our limited understanding is not lack of data, but that most data sets are generated and
ultimately analyzed in isolation, limiting their full potential. To further our understanding of the organizing
principles of the regulatory genome, it is therefore essential to take an ​en masse approach to data analysis,
exploiting the dynamics across large numbers of observations. In this project, we will use this notion to
develop methods for defining the first comprehensive and pragmatically useful human regulatory genome
annotation based on the coordinated occurrence patterns of regulatory elements across hundreds of cell types
and states. Beyond individual elements, we will define multi-kilobase domains of shared regulatory activity,
which will shed light on the regulatory landscapes around genes and higher-order regulatory domains. In
addition, we will integrate regulatory annotations with orthogonal information based on functional genomics
chromatin state data to arrive at a rich composite view of the regulatory genome. Lastly, we will develop the
first fully data-driven system for designing and validating context-specific synthetic regulatory elements. We
anticipate that our results will provide a new lens on the human regulatory genome, which will open up new
research avenues in the areas of systems and synthetic biology, ultimately contributing to the understanding
and treatment of human disease. We are determined to provide the genomics community with pragmatically
useful regulatory genome annotations and tools to utilize these resources.

## Key facts

- **NIH application ID:** 10048667
- **Project number:** 1R35HG011317-01
- **Recipient organization:** ALTIUS INSTITUTE FOR BIOMEDICAL SCIENCES
- **Principal Investigator:** Wouter Meuleman
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $369,847
- **Award type:** 1
- **Project period:** 2020-09-01 → 2025-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10048667

## Citation

> US National Institutes of Health, RePORTER application 10048667, Elucidation of the organizing principles of the regulatory genome through large-scale data integration (1R35HG011317-01). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10048667. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*