# The Genome Aggregation Database (gnomAD)

> **NIH NIH U24** · BROAD INSTITUTE, INC. · 2021 · $2,187,000

## Abstract

Project Summary
The Genome Aggregation Database (gnomAD) is a ubiquitous resource for basic research and clinical
interpretation. The world’s largest genetic variation resource, the gnomAD dataset is used in virtually all clinical
genetic diagnostic pipelines worldwide, and the website has over 20 million page views to date. Here we
outline a proposal that will expand the gnomAD resource to millions of samples across diverse global
populations. Our proposal will scale variant-calling and quality control to match this sample size, integrate
statistical tools and other genomic resources critical to clinical interpretation, and ensure that the data we
aggregate will continue to be shared freely with the biomedical community. To accomplish this we will apply a
highly computationally efficient strategy to call all classes of variation (including SNVs, small indels, and the
mutational spectrum of structural variants) across millions of sequenced samples enriched for under-
represented ancestry groups. We will deploy a cloud-based framework for the efficient storage and automated
quality control of these very large and heterogeneous sequence data sets using the massively parallel Hail
architecture. We will leverage the scale of gnomAD to provide increasingly high-resolution maps of the
depletion of functional variation across regions of the genome (highlighting genome regions where natural
selection constrains DNA change) and provide statistical frameworks for quantitatively assessing whether the
population frequency of a variant is consistent with pathogenicity, linking this information with evidence from
the ClinVar resource. We will continue to share all of this data as rapidly and openly as possible with the
biomedical community, long before publication. We will support and expand functionality in our widely
accessed data browser as well as create scalable and publicly accessible datasets that integrate our variation
data with clinical and functional genomic annotations, accessible through API frameworks to empower novel
applications of the datasets. We will also provide resources and training to improve the use of gnomAD
resources by the clinical genetics and wider biomedical communities.

## Key facts

- **NIH application ID:** 10089969
- **Project number:** 1U24HG011450-01
- **Recipient organization:** BROAD INSTITUTE, INC.
- **Principal Investigator:** Mark Joseph Daly
- **Activity code:** U24 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $2,187,000
- **Award type:** 1
- **Project period:** 2021-02-08 → 2026-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10089969

## Citation

> US National Institutes of Health, RePORTER application 10089969, The Genome Aggregation Database (gnomAD) (1U24HG011450-01). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10089969. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*