# Maintaining, improving, and providing the human reference

> **NIH NIH U41** · WASHINGTON UNIVERSITY · 2020 · $2,386,814

## Abstract

PROJECT SUMMARY (Project 1: Maintaining, improving, and providing the human reference)
We propose to combine the best current methods and practices with a practical, scalable model to create
and share a broadly useful pan-human genome reference (the "pan-genome") based on the assemblies and
raw data delivered by the genome production center. The pan-genome reference we propose does not wholly
replace the existing GRCh38 reference, rather it substantially builds upon it to create a reference resource
that incorporates a much richer representation of human genetic diversity. To deliver this plan, we have
assembled a strong group of individuals with complementary expertise that together will construct, share,
maintain and improve a state-of-the-art human pan-genome resource. To convert the initial genome
assemblies into a high-quality human genome reference cohort we will perform assembly quality control,
error correction and validation, mirroring the successful processes we have created for maintaining and
improving the existing human reference genome. To make this collection of assemblies comparable, we will
then create a comprehensive map of the genomic variants that exist among them. Using the human genome
reference cohort and variation map, we will create a human pan-genome reference that has three
complementary and essential parts: (i) a sequence graph that encodes the genomes in a non-redundant
manner by merging together shared sequences; (ii) a searchable encoding of the haplotypes in the sequence
graph, something that the graph itself does not capture; and (iii) a coordinate system that makes it possible
to refer to all the variation equally while preserving backwards compatibility with GRCh38. To build this graph
we will use tools that we have developed and work across the community to test and prototype the approach,
releasing stable and versioned pan-genome references. We also propose a plan to handle error reports from
the consortium and broader community, and to fix errors in the references via data analysis, curation and
targeted sequencing. We will follow a completely open model to ensure all data – primary data, genome
assemblies and pan-genome reference – are simultaneously available via AnVIL and accessioned through
appropriate international databases. We will add value to this resource by creating a core set of pan-genome
functional annotations, focusing primarily on genes. To make working with and migrating to the proposed
pan-genome reference straightforward, we will foster the creation of new and updated tools. Our strategy
harnesses the community by working with tool developers, in particular establishing and promoting exchange
formats and creating benchmarks to promote best-of-breed community methods.

## Key facts

- **NIH application ID:** 10020432
- **Project number:** 5U41HG010972-02
- **Recipient organization:** WASHINGTON UNIVERSITY
- **Principal Investigator:** Ting Wang
- **Activity code:** U41 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $2,386,814
- **Award type:** 5
- **Project period:** — → —

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10020432

## Citation

> US National Institutes of Health, RePORTER application 10020432, Maintaining, improving, and providing the human reference (5U41HG010972-02). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10020432. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
