# Scalable methods for the characterization and analysis of families in large genomic datasets

> **NIH NIH R35** · 23ANDME, INC. · 2023 · $266,646

## Abstract

PROJECT SUMMARY
Numerous studies of common genetic diseases in humans are now analyzing very large genomic datasets
with information from up to 500,000 individuals. These large studies pose challenges to traditional analysis
approaches—especially in terms of computational runtime scaling—but also afford opportunities for reﬁned in-
ference, and necessitate the development of new computational methods. The program of research we will
undertake focuses on the emerging opportunities of widespread relatedness in large studies. We are currently
developing a method to efﬁciently infer identical by descent (IBD) sharing using an algorithm that does not
require phased data. We are also ﬁnalizing a method that distinguishes among second degree relative types—
half-sibling, avuncular, and grandparent-grandchild pairs. Building on these models, we will develop novel,
efﬁcient methods to: (1) identify pedigrees that deﬁne close relationships within large datasets; (2) fundamen-
tally advance genome-wide association studies (GWAS) by inferring the genomes of parents of sets of siblings
and other relatives; (3) leverage recombination patterns in men and women to infer the parent-of-origin of hap-
lotypes in a set of close relatives; and (4) infer haplotypes by jointly modeling both family- and population-level
structure. Notably, no method we are aware of enables the reconstruction of parent haplotypes without parent
data, and this will enable improved GWAS power by utilizing individuals for whom more complete health history
information is known. Furthermore, few studies of parent-of-origin associations have been done in humans be-
cause of the difﬁculty of obtaining parent data, but we will perform these analyses in large studies even without
parent data. All software will be made freely available to the public and distributed under open source software
licenses.

## Key facts

- **NIH application ID:** 10706540
- **Project number:** 5R35GM133805-05
- **Recipient organization:** 23ANDME, INC.
- **Principal Investigator:** Amy Lynne Williams
- **Activity code:** R35 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $266,646
- **Award type:** 5
- **Project period:** 2019-09-01 → 2024-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10706540

## Citation

> US National Institutes of Health, RePORTER application 10706540, Scalable methods for the characterization and analysis of families in large genomic datasets (5R35GM133805-05). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10706540. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
