Scalable methods for the characterization and analysis of families in large genomic datasets

NIH RePORTER · NIH · R35 · $357,730 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY Numerous studies of common genetic diseases in humans are now analyzing very large genomic datasets with information from up to 500,000 individuals. These large studies pose challenges to traditional analysis approaches—especially in terms of computational runtime scaling—but also afford opportunities for refined in- ference, and necessitate the development of new computational methods. The program of research we will undertake focuses on the emerging opportunities of widespread relatedness in large studies. We are currently developing a method to efficiently infer identical by descent (IBD) sharing using an algorithm that does not require phased data. We are also finalizing a method that distinguishes among second degree relative types— half-sibling, avuncular, and grandparent-grandchild pairs. Building on these models, we will develop novel, efficient methods to: (1) identify pedigrees that define close relationships within large datasets; (2) fundamen- tally advance genome-wide association studies (GWAS) by inferring the genomes of parents of sets of siblings and other relatives; (3) leverage recombination patterns in men and women to infer the parent-of-origin of hap- lotypes in a set of close relatives; and (4) infer haplotypes by jointly modeling both family- and population-level structure. Notably, no method we are aware of enables the reconstruction of parent haplotypes without parent data, and this will enable improved GWAS power by utilizing individuals for whom more complete health history information is known. Furthermore, few studies of parent-of-origin associations have been done in humans be- cause of the difficulty of obtaining parent data, but we will perform these analyses in large studies even without parent data. All software will be made freely available to the public and distributed under open source software licenses.

Key facts

NIH application ID
10228676
Project number
5R35GM133805-03
Recipient
CORNELL UNIVERSITY
Principal Investigator
Amy Lynne Williams
Activity code
R35
Funding institute
NIH
Fiscal year
2021
Award amount
$357,730
Award type
5
Project period
2019-09-01 → 2022-07-31