# Identification of causal non-coding variants in childhood genetic disorders

> **NIH NIH R35** · ST. JUDE CHILDREN'S RESEARCH HOSPITAL · 2020 · $448,750

## Abstract

PROJECT SUMMARY
My long-term research goal is to understand the organization and function of Cis-regulatory modules (CRMs) in
the human genome, with a focus on their impact on development and disease. CRMs, such as promoters,
enhancers, and insulators, are DNA elements that regulate gene expression. Genome-wide association studies
(GWAS) have shown that most variants associated with a phenotype or disease are located outside of protein-
coding regions and are postulated to affect gene expression levels through CRMs. Therefore, understanding the
organization and function of CRMs is key to identifying the causes of genetic diseases and providing an essential
backbone for precision medicine. Even though millions of putative CRMs have recently been identified with the
help of high-throughput assays, it remains challenging to pinpoint functional CRMs that regulate tissue and
developmental stage-specific transcription. In fact, a large proportion of the CRM variants identified so far have
no-to-mild effects on the phenotype. As a result, those insights have very limited clinical application. Over the
next five years, the goal of my research is to accurately identify causal CRM variants that affect normal blood
cell development and impact childhood blood disorders. Several major hurdles must be overcome to achieve
this goal. First, mounting evidence indicates that the expression fluctuation is an important trait for genes.
Importantly, the tolerance of expression fluctuation varies among different genes. We reason that CRMs
modulating transcription of highly expression-sensitive genes tend to be essential to cell function and harbor
pathological non-coding variants. However, our understanding on expression-sensitive genes and their
underlying biology is still rudimentary. Secondly, different epigenetic modification markers are routinely used to
map potential CRMs. However, in many loci, those epigenetic markers are not required by CRM functions.
Overreliance on associative, instead of causative, markers can confuse accurate identification of biologically
important CRMs. Thirdly, while the genetic code of protein-coding sequences has been discovered for decades,
the similar “grammar” for non-coding sequences and CRMs in particular is still lacking. As a result, we are not
able to predict how CRM variants affect their regulatory functions. Based on those challenges, we ask three
fundamental questions: 1) How to systematically identify expression-sensitive genes? 2) How to decipher the
causative mechanism of CRMs? 3) How can single-nucleotide variants (SNV) affect CRM functions? If
successful, the proposed studies will identify functionally important CRMs controlling health-related traits and
pinpoint pathological non-coding variants within those CRMs. Better understanding the anatomy and function of
CRMs will facilitate precision medicine by allowing us to treat genetic diseases by manipulation of CRM function
via gene editing or pharmacological approache...

## Key facts

- **NIH application ID:** 9997981
- **Project number:** 5R35GM133614-02
- **Recipient organization:** ST. JUDE CHILDREN'S RESEARCH HOSPITAL
- **Principal Investigator:** Yong Cheng
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $448,750
- **Award type:** 5
- **Project period:** 2019-09-01 → 2024-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9997981

## Citation

> US National Institutes of Health, RePORTER application 9997981, Identification of causal non-coding variants in childhood genetic disorders (5R35GM133614-02). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/9997981. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*