Statistical Analysis of Large Genomic Data Sets

NIH RePORTER · NIH · R01 · $466,949 · view on reporter.nih.gov ↗

Abstract

ABSTRACT Recent large genome wide association studies (GWAS) have identified hundreds to thousands of genetic variants associated with complex traits. The resulting GWAS summary statistics, together with large Biobank data, provide an unprecedented opportunity to understand the genetic mechanisms of complex traits. Inferring causal effects of risk factors on disease is the major challenge of observational epidemiology studies, which now can be addressed using genomic data through the cost-efficient Mendelian Randomization (MR) approach. However, current MR approaches suffer from bias due to multiple sources, including weak instrument variables (IVs), sample overlap, horizontal pleiotropy, and linkage disequilibrium (LD) among IVs. Novel statistical methods that can unbiasedly infer causality and estimate causal effects are therefore needed. On the other hand, one of the dominant views in the field is that genetic variation of complex disease is largely explained by additive effects. Even though gene-environment and gene-gene interactions have been well documented in experiment studies, the contribution of interactions is still unclear, partially because of limitations of current analytic approaches. The current methodological development focuses on improving computational efficiency to overcome the burden from the large number of interaction tests at the genome wide level, but the fundamental method is based on standard linear regression that often has low statistical power. In this project, we will develop, (1) novel unbiased multivariable MR with application to large genomics data and Biobank data, (2) novel powerful gene-environment interaction (𝐺 × 𝐸) methods with application to large genomics and Biobank data, (3) a novel powerful gene-gene interaction (𝐺 × 𝐺) method with application to large genomics and Biobank data, (4) corresponding software that will be made publicly available. We will apply these methods and software to UK Biobank, TOPMED WGS and All of Us, as well as many existing GWAS summary statistic datasets. We request support to develop statistical methods and software to address these goals. The proposed novel multivariable MR methods and 𝐺 × 𝐸 and 𝐺 × 𝐺 methods would speed up the new discoveries and improve our understanding of genetic architecture of complex traits, which aligns with the National Human Genome Research Institute mission.

Key facts

NIH application ID: 10876725
Project number: 2R01HG011052-05
Recipient: CASE WESTERN RESERVE UNIVERSITY
Principal Investigator: XIAOFENG ZHU
Activity code: R01
Funding institute: NIH
Fiscal year: 2024
Award amount: $466,949
Award type: 2
Project period: 2020-05-08 → 2028-04-30