# Integrate gene expression data to characterize the contribution of rare genetic risk factors to structural birth defects

> **NIH NIH R03** · COLUMBIA UNIVERSITY HEALTH SCIENCES · 2020 · $157,441

## Abstract

Project Summary
 We aim to maximize discovery of new risk genes and elucidate the genetic architecture of structural
birth defects. To achieve that, we propose cross-disease genetic analysis of both protein-coding and
noncoding variants and integration of gene expression data to prioritize candidate risk genes.
 Better understanding of the genetic basis of structural birth defects will lead to new insights into
human developmental biology and will provide targets for medical intervention and treatment. Recent
large-scale genome and exome sequencing studies of birth defects have identified new risk genes,
especially through the analysis of de novo variants in protein coding regions. However, we are still far
from complete understanding of the genetic causes of birth defects. Estimates are that there are 400-
800 risk genes of large effect size for birth defects such as congenital heart disease, and the vast majority
of these genes are unknown. This is primarily due to the lack of statistical power. While increasing sample
size is essential and is part of the core deliverables of the Gabriella Miller Kids First (GMKF) programs,
we also need to develop and apply new analytical methods that improve power and maximize the utility
of the available genetic data by using other types of data and biological knowledge. In addition, in most
prior studies, the analysis of rare genetic variation has been focused on small variants in the coding
regions or large copy number variants (CNV). The data and methods to interrogate the contribution of
rare noncoding variants is rudimentary, limiting our understanding of genetic architecture of these
diseases. In this study, we propose two aims to address these questions by leverage GMKF cross-disease
whole genome sequencing data sets: Specific Aim 1. Elucidate genetic architecture by cross-disease
analysis of rare coding and non-coding variants. Specific Aim 2. Integrate gene expression with genome
sequencing data to improve discovery and biological interpretation of risk genes of structural birth
defects.
 The proposed study will maximize the genetic discovery potential of the GMKF WGS data sets for
birth defects and improve our understanding of the pleiotropic effects and tissue specificity of risk genes
and variants. The analytical approaches developed in this study will be applicable to genetic data of birth
defects and developmental disorders from future GMKF cohorts and other programs.

## Key facts

- **NIH application ID:** 9882319
- **Project number:** 5R03HL147197-02
- **Recipient organization:** COLUMBIA UNIVERSITY HEALTH SCIENCES
- **Principal Investigator:** Yufeng Shen
- **Activity code:** R03 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $157,441
- **Award type:** 5
- **Project period:** 2019-04-01 → 2021-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9882319

## Citation

> US National Institutes of Health, RePORTER application 9882319, Integrate gene expression data to characterize the contribution of rare genetic risk factors to structural birth defects (5R03HL147197-02). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/9882319. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
