Center for Human Reference Genome Diversity

NIH RePORTER · NIH · U01 · $3,989,201 · view on reporter.nih.gov ↗

Abstract

Project Abstract The goal of our Center for Human Reference Genome Diversity is to generate as error-free, gapless, complete, and correctly haplotype-phased genome assemblies as possible from a set of 350 persons comprehensively capturing the full extent of human diversity. We aim to capture >99% of allelic variants with >1% allele frequency, and to provide these genomes as a resource to the international community to enable genomic medicine and research addressing fundamental unanswered questions in biology and disease. We will employ a multi-platform approach using cutting-edge long read and linked read technologies to obtain the highest quality phased genomes. Aim 1 will focus on sample collection and procuring cell lines from at least 350 individuals with a specific emphasis on filling in gaps in human diversity. Aim 2 will generate highly contiguous chromosomal level assemblies that are over 99% haplotype-phased for at least 700 haploid genomes from 350 diploid samples. Aim 3 will finish these genomes to be gapless from telomere-to-telomere (T2T) for each chromosome. Aim 4 will evaluate the genomes for accuracy and completeness and perform initial variant calling to assess the level of human diversity. We will use a novel combination of technologies, sequencing strategies, and algorithms that we and others developed to produce the highest quality and most complete genome assemblies to date. Our effort will specifically target regions that have been excluded by other efforts, including segmental duplications, centromeres, and acrocentric DNA. To achieve these aims we have assembled an exceptional team consisting of leaders from around the world in consent ethics, sample collection, sample extraction, and high-quality genome sequencing, assembly, finishing and evaluation. The team also has expertise in using genomic technologies to address a broad range of scientific questions, so is highly cognizant of the practical needs of biomedical researchers who will use this resource. The high-quality genomes produced will be passed to the Human Reference Genome Center (HGRC) and Genome Reference Representation (GRR) groups for curation and release. The result will be a pan-human genome reference, representing important human diversity not present in the current reference genome. The data we generate will enable a fundamental shift in human genetics, fostering new discoveries from the single-nucleotide to chromosomal levels and revealing a more accurate and global view of the human population.

Key facts

NIH application ID
10686965
Project number
5U01HG010971-05
Recipient
UNIVERSITY OF CALIFORNIA SANTA CRUZ
Principal Investigator
Evan Eichler
Activity code
U01
Funding institute
NIH
Fiscal year
2023
Award amount
$3,989,201
Award type
5
Project period
2019-09-18 → 2024-09-23