# Analysis of Genomic and Complex Data

> **NIH NIH R01** · YALE UNIVERSITY · 2021 · $361,654

## Abstract

The advent of genomic and imaging technologies provides us with a great opportunity to study and understand
health conditions, including substance use and mental illnesses, which are complex and depend on both
genetic and environmental factors. In the past decades genomewide association studies (GWA) have identified
and robustly replicated numerous genetic variants that are associated with complex diseases. Despite those
successes, it remains persistently difficult to identify genes and environmental factors--the so called
geneticist's nightmare. Most of the identified variants have low associated risks and account for little
heritability, and there is increasing attention focused on finding the “missing heritability" of complex diseases.
Furthermore, it is documented that clinical contributions from neuropsychiatric research have been minimal
due to traditionally small sample sizes of studies, biologically incorrect diagnostic labels, comorbidity and
heterogeneity of the diseases. To address these problems and advance clinical science, we need to develop
novel models and methods to efficiently use and understand the available data. This is the primary motivation
for our project. We will develop more efficient approaches that utilize biological information (genetic and/or
phenotypic data) and directly address the comorbidity issue. In addition, we will analyze large datasets such
as UK BioBank with demographic, clinical, and genetic data. We will further take advantage of the
investigators' many years of experience in the data collection and analysis of GWA studies and build on our
successes in the development and applications of statistical methods and software for complex studies. The
primary aim of this application is to develop, evaluate, and apply new statistical (both parametric and
nonparametric) models, methods, and software to conduct genetic analyses of complex diseases. To deal with
the challenges stated above, our proposed methods will address one or more of the following topics: (a)
analysis of genetic, phenotypic, and environmental data; (b) modeling comorbidity through multivariate traits;
and (c) identification and incorporation of novel genetic variants including their interactions with environmental
factors by using and developing state-of-the-art statistical methodology and software, such as trees and
forests. The success of our project will have a direct impact on our understanding, and ultimately, the treatment
and prevention of diseases which are of significant public health concern.

## Key facts

- **NIH application ID:** 10115096
- **Project number:** 5R01HG010171-03
- **Recipient organization:** YALE UNIVERSITY
- **Principal Investigator:** HEPING ZHANG
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $361,654
- **Award type:** 5
- **Project period:** 2019-05-08 → 2023-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10115096

## Citation

> US National Institutes of Health, RePORTER application 10115096, Analysis of Genomic and Complex Data (5R01HG010171-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10115096. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
