# Novel statistical methods and tools to integrate multiple endophenotypes and functional annotation data to study the roles of rare variants in complex human diseases using sequencing data

> **NIH NIH R01** · YALE UNIVERSITY · 2020 · $328,257

## Abstract

Project Summary
In the past fifteen years, great efforts have been made to understand the genetic architecture of complex
human diseases through genome-wide association studies. Although many genome-wide significant variants
have been identified, the heritability or variance explained by these variants remains very small, suggesting
substantial missing heritability that may yet be explained by common genetic variants with smaller effect sizes
and/or rare and low frequency variants, which calls for the development and application of novel statistical
methods to whole genome/exome sequencing data collected from deeply phenotyped cohorts. In this project,
we will develop methods that leverage multiple correlated endophenotypes and further integrate functional
annotation data to identify novel rare variants for complex traits. We will develop a set of new computational
and analytical tools that are practically useful and broadly applicable to general sequencing studies, and the
applications of our methods will likely identity novel rare variant associations and shed new lights on the
genetics of cardiometabolic diseases.
In Aim 1, we propose to develop novel statistical methods to integrate multiple endophenotypes to study the
impact of rare variants on complex human diseases. Our methods will fill in the gap between the current
practice of association studies and the practical needs of integrating endophenotypes for improved
understanding and diagnosis of clinical outcomes. In Aim 2, we will extend the methods to meta-analyses
across studies. In Aim 3, we will develop a novel kernel machine learning approach to integrating various
functional information to annotate the whole genome region, and further integrate them to develop a dynamic
whole-genome scan test to detect rare variant associations with multiple endophenotypes. We will leverage the
NHLBI TOPMed whole genome sequencing (WGS) data and the UK Biobank whole exome sequencing (WES)
data, and integrate the functional annotation data to identify and dissect the role of rare variants on the
cardiometabolic traits (Aim 4). Our proposed work is cost-effective as it leverages the existing WGS/WES
samples and functional annotation data while providing methods and tools that are broadly applicable to other
studies, and builds on a strong team of scientists with proven track record in statistical genetics, large-scale
genetic studies, and cardiometabolic traits. We expect our methods will lead to the discoveries of many more
rare and low frequency variants for these traits. These results will offer new insights to help design more
effective treatment and prevention strategies. All our proposed methods will be disseminated to the public
through well-tested and publicly available software (Aim 5).

## Key facts

- **NIH application ID:** 9972090
- **Project number:** 1R01GM134005-01A1
- **Recipient organization:** YALE UNIVERSITY
- **Principal Investigator:** Baolin Wu
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $328,257
- **Award type:** 1
- **Project period:** 2020-06-01 → 2024-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9972090

## Citation

> US National Institutes of Health, RePORTER application 9972090, Novel statistical methods and tools to integrate multiple endophenotypes and functional annotation data to study the roles of rare variants in complex human diseases using sequencing data (1R01GM134005-01A1). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/9972090. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
