pleioR: A powerful and fast test and software for the study of pleiotropy in systems involving many traits with biobank-sized data

NIH RePORTER · NIH · R03 · $78,250 · view on reporter.nih.gov ↗

Abstract

Pleiotropy (i.e., variants that confer risk to multiple characters) leads to the genetic correlation between traits and underlies the development of many syndromes. The identification of variants with pleiotropic effects on health- related traits can improve the biological understanding of gene action and disease etiology, and can help to advance disease-risk prediction. However, mapping pleiotropic risk loci is statistically and computationally challenging. Schaid et al. (Genetics, 2016) proposed an intersection-union sequential test that addresses the statistical challenges emerging in multi-trait genome-wide association analyses. Schaid’s sequential Likelihood Ratio Test (sLRT) is powerful, provides adequate error control, and leads to easy-to-interpret results. However, the adoption of the methodology remains limited because the proposed test and the existing software do not scale to big data (hundreds of thousands of individuals, millions of SNPs, many traits). Therefore, we propose to develop an alternative to the sLRT that achieves the same power but involves computations that scale to big data. Our approach adopts the intersection-union sequential testing framework but uses a Wald test and an approximation that substantially reduces the computational burden. Preliminary results presented in this grant show that the proposed test, and the beta C++ implementation we developed, has the power and error-control performance of the sLRT, it is considerably faster (by a factor of about 300), and scales to big data. In this project, we will (Aim 1) conduct extensive simulations to assess the statistical properties of the proposed test. (Aim 2) We will integrate memory mapping with optimized in-memory computations to develop open-source software that will implement the proposed test within the R environment, in a software package that will scale to big-data analysis. (Aim 3) Finally, we will use the methods and software developed in Aim 3, together with data from the UK-Biobank, to study the genetic underpinnings of Metabolic Syndrome. The advent of biobank data has opened unprecedented opportunities for mapping genetic loci affecting complex biological networks. However, more efficient data analysis tools are needed to unleash the potential of modern biobanks. This proposal will: (i) Develop novel methods for mapping risk loci affecting systems of traits. (ii) Develop and share with the research community software that can be used to analyze multidimensional phenotypes with big data. (iii) Advance knowledge of the genetic basis of Metabolic Syndrome.

Key facts

NIH application ID: 10424541
Project number: 5R03HG011674-02
Recipient: MICHIGAN STATE UNIVERSITY
Principal Investigator: Gustavo de los Campos
Activity code: R03
Funding institute: NIH
Fiscal year: 2022
Award amount: $78,250
Award type: 5
Project period: 2021-06-15 → 2023-05-31