# Statistical analysis of large genomic data sets

> **NIH NIH R01** · CASE WESTERN RESERVE UNIVERSITY · 2020 · $404,435

## Abstract

Heritability analysis in the largest whole genome sequence (WGS) dataset, the NHLBI
Trans-omics for Precision Medicine Whole Genome Sequencing Program (TOPMed),
strongly suggested that “missing heritability” can be attributed to rare variants that are
not well targeted by array-based genotype variants. Large genome wide association
studies (GWAS), complemented by whole genome sequencing studies (WGS), will be a
cost efficient strategy to identify genetic variants and understand the genetic architecture
of complex traits. Multiple large Biobanks with SNP-array data and whole genome
sequencing data, such as the NHLBI Trans-omics for Precision Medicine Whole
Genome Sequencing Program (TOPMed), provide an unprecedented but challenging
opportunity to understand the genetic mechanisms underlying complex diseases. We
have identified three pressing challenges in utilizing large GWAS and WGS datasets and
propose the following four specific aims to meet the challenges: 1) Differentiate
horizontal pleiotropy from mediation using GWAS summary statistics and apply the
methods to publicly existing data. 2) Prioritize genetic variants sensitive to interactions,
and estimate the overall contribution of interactions to a phenotype. 3) Incorporate family
linkage/local ancestry to identify genetic variants in the TOPMed whole genome
sequencing data. 4) Develop corresponding software that will be made publicly
available. We will apply our new analytic methods to TOPMED WGS, UK Biobank data
and many existing GWAS summary statistics. Our data analysis will focus on blood
pressure, obesity and sleep disorders, and their effects on disease outcomes such as
cardiovascular disease, diabetes, heart failure and dementia.

## Key facts

- **NIH application ID:** 9943545
- **Project number:** 1R01HG011052-01
- **Recipient organization:** CASE WESTERN RESERVE UNIVERSITY
- **Principal Investigator:** XIAOFENG ZHU
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $404,435
- **Award type:** 1
- **Project period:** 2020-05-08 → 2024-02-29

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9943545

## Citation

> US National Institutes of Health, RePORTER application 9943545, Statistical analysis of large genomic data sets (1R01HG011052-01). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9943545. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*