Methods to enable robust and efficient use of genetic summary data

NIH RePORTER · NIH · R35 · $408,016 · view on reporter.nih.gov ↗

Abstract

Abstract Publiclyavailable genetic summary data canhave high utility for providing insight into genetic etiology of health and disease. Databases of genotype frequencies, such as the genome Aggregation Database (gnomAD), are used to prioritize putative causal variants and, more recently, as pseudo-controls in case-control analysis. Genome Wide Association Study (GWAS) test statistics are used in a variety of secondary data analyses including polygenic risk scores (PRS), genetic correlation analysis, and fine mapping of causal variants. Compared with individual level data, genetic summary data often has fewer barriers in access, promoting broad use of these valuable data resources. The availability and use of summary genetic data is often not equitable across all ancestral groups, especially for understudied ancestral groups that have little to no representation within these resources. Furthermore, heterogeneity within the summary data can lead to confounding and reduced power for case-control analysis, incorrect prioritization of putative causal variants for rare diseases, and reduced accuracy for polygenic risk scores. I develop robust and efficient methods to appropriately use genetic summary data while estimating, modeling, and harnessing the heterogeneity within. My methods coalesce around a unifying framework where I flip the paradigm of genetic and genomic data treating the genetic variant or element as the observational unit by which we analyze the data rather than the individual. This simple, yet innovative paradigm shift enables the use of classical statistical techniques and the creation of methods that detect, adjust for, and even use heterogeneity within summary level data. To enable broad and equitable use of our methods, we will create publicly available R packages compatible with Bioconductor and Shiny Apps for interactive internet use.

Key facts

NIH application ID
10653969
Project number
5R35HG011293-04
Recipient
UNIVERSITY OF COLORADO DENVER
Principal Investigator
Audrey E Hendricks
Activity code
R35
Funding institute
NIH
Fiscal year
2023
Award amount
$408,016
Award type
5
Project period
2020-09-01 → 2025-06-30