The Genome Aggregation Database (gnomAD)

NIH RePORTER · NIH · U24 · $2,187,000 · view on reporter.nih.gov ↗

Abstract

Project Summary The Genome Aggregation Database (gnomAD) is a ubiquitous resource for basic research and clinical interpretation. The world’s largest genetic variation resource, the gnomAD dataset is used in virtually all clinical genetic diagnostic pipelines worldwide, and the website has over 20 million page views to date. Here we outline a proposal that will expand the gnomAD resource to millions of samples across diverse global populations. Our proposal will scale variant-calling and quality control to match this sample size, integrate statistical tools and other genomic resources critical to clinical interpretation, and ensure that the data we aggregate will continue to be shared freely with the biomedical community. To accomplish this we will apply a highly computationally efficient strategy to call all classes of variation (including SNVs, small indels, and the mutational spectrum of structural variants) across millions of sequenced samples enriched for under- represented ancestry groups. We will deploy a cloud-based framework for the efficient storage and automated quality control of these very large and heterogeneous sequence data sets using the massively parallel Hail architecture. We will leverage the scale of gnomAD to provide increasingly high-resolution maps of the depletion of functional variation across regions of the genome (highlighting genome regions where natural selection constrains DNA change) and provide statistical frameworks for quantitatively assessing whether the population frequency of a variant is consistent with pathogenicity, linking this information with evidence from the ClinVar resource. We will continue to share all of this data as rapidly and openly as possible with the biomedical community, long before publication. We will support and expand functionality in our widely accessed data browser as well as create scalable and publicly accessible datasets that integrate our variation data with clinical and functional genomic annotations, accessible through API frameworks to empower novel applications of the datasets. We will also provide resources and training to improve the use of gnomAD resources by the clinical genetics and wider biomedical communities.

Key facts

NIH application ID: 10089969
Project number: 1U24HG011450-01
Recipient: BROAD INSTITUTE, INC.
Principal Investigator: Mark Joseph Daly
Activity code: U24
Funding institute: NIH
Fiscal year: 2021
Award amount: $2,187,000
Award type: 1
Project period: 2021-02-08 → 2026-01-31