# Big Data Methods for Decoding Gene Regulation

> **NIH NIH R01** · JOHNS HOPKINS UNIVERSITY · 2021 · $424,118

## Abstract

Project Summary
A comprehensive understanding of how genes' activities are controlled temporally and spatially is crucial for
studying human development and diseases. Transcription factors (TFs) are an important class of regulatory
proteins that can control genes' transcriptional activities by binding to target genes' regulatory DNA sequences
called cis-regulatory elements (CREs). A map of genome-wide activities of CREs, or “regulome”, in all cell
types and biological conditions will provide a foundation for investigating the basic operating rules of biology,
interpreting how genetic variants cause diseases, and guiding the development of disease treatment strategies.
Unfortunately, existing experimental regulome mapping technologies cannot analyze a large number of samples
efﬁciently. Thus far, they have only been applied to map regulomes in a small fraction of all biological contexts.
As a result, today a comprehensive map of human regulatory landscape is still lacking.
 This study aims to develop a solution to mapping regulomes in a massive number of biological samples from
diverse cell types and conditions by leveraging publicly available functional genomic data. We will use the rich
gene expression and regulome data generated by the Encyclopedia of DNA Elements (ENCODE) project to
develop a new prediction approach that predicts a biological sample's regulome using its transcriptome (Aim 1).
We will then apply the trained prediction models to 290,000+ publicly available human gene expression samples
in the Gene Expression Omnibus (GEO) database to create a regulome map that covers hundreds of thousands
more biological contexts than existing regulome data (Aim 2). We will also develop a method to help researchers
explore the massive datasets to gain biological insights into gene regulation by projecting the data to their low-
dimensional structure reﬂecting their developmental trajectory (Aim 3).
 Our research will create new analytical methods for predicting ultra-high-dimensional outcomes using ultra-
high-dimensional predictors, making cross-platform predictions when the training and application data are gener-
ated by different technological platforms with systematic platform differences, and retrieving the low-dimensional
spanning tree structure from a massive dataset. Applying these new methods to the vast amounts of publicly
available gene expression data will allow us to address a major challenge in regulome mapping that cannot be
solved using existing experimental technologies. By enabling fast and cost-efﬁcient mapping and analysis of
human gene regulatory landscape, the proposed research can have a major impact on future studies of human
development and diseases.

## Key facts

- **NIH application ID:** 10171879
- **Project number:** 5R01HG009518-04
- **Recipient organization:** JOHNS HOPKINS UNIVERSITY
- **Principal Investigator:** Hongkai Ji
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $424,118
- **Award type:** 5
- **Project period:** 2018-08-10 → 2024-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10171879

## Citation

> US National Institutes of Health, RePORTER application 10171879, Big Data Methods for Decoding Gene Regulation (5R01HG009518-04). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10171879. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*