Statistical Models for Genetic Studies, Using Network and Integrative Analysis

NIH RePORTER · NIH · R01 · $332,796 · view on reporter.nih.gov ↗

Abstract

Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which in some cases have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. However, investigation of complex traits often suffers from limited statistical power due to polygenicity, high dimensionality, and moderate sample size. While it is practically challenging and costly to recruit patients to attain sufficient sample size to identify all associated genetic variants, we recently showed that statistical power to identify risk associated genetic variants can be significantly increased by 1) considering genetic basis shared among multiple phenotypes, namely pleiotropy, and 2) incorporating genomic and genetic annotation data. However, effective integration of these datasets becomes statistically more challenging as the number of genetic studies and annotation data increases. The objective of this proposal is to develop statistical methods and software to improve identification and interpretation of risk variants and to promote understanding of genetic relationship among phenotypes. This objective will be attained by pursuing four specific aims. In Aim 1, we will develop a Bayesian graphical model to identify risk variants and construct a phenotype network, by integrating multiple GWAS datasets with various annotation data. In Aim 2, we will develop a Bayesian graphical model to build a phenotype network from biomedical literature. In Aim 3, we will develop a statistical method to construct meta-annotations that can effectively summarize high dimensional annotation data without losing interpretability. In Aim 4, we will apply these methods to genetic studies of vascular complications and autoimmune diseases in African American populations, with PubMed literature and various annotation datasets. The proposed research is innovative because it proposes a novel statistical framework that integrates multiple GWAS, biomedical literature, and annotation datasets to improve identification and interpretation of risk variants. The proposed research is significant because it is expected to help improve diagnosis and treatment of diseases with more effective identification of risk variants and enhanced understanding of common etiology among diseases.

Key facts

NIH application ID
9920162
Project number
5R01GM122078-06
Recipient
OHIO STATE UNIVERSITY
Principal Investigator
Dongjun Chung
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$332,796
Award type
5
Project period
2016-07-21 → 2022-04-30