# Long-read sequence and assembly of segmental duplications

> **NIH NIH R01** · UNIVERSITY OF WASHINGTON · 2022 · $717,900

## Abstract

ABSTRACT
The completion of the first genome has revealed a complex pattern of recent duplications which contributes
disproportionately to human genetic variation. Our genome is particularly enriched for interspersed segmental
duplications, which harbor rapidly evolving genes and predispose our species to copy number variation and
recurrent rearrangements associated with disease. Their length, sequence identity and structural variation,
however, still complicate genome assembly and represent a major impediment to generation of telomere-to-
telomere assemblies of human genomes. The long-term objective of this research program has been to
develop computational and experimental methods to understand the organization, genetic diversity, and
disease impact of segmental duplications. The goal of this competing renewal is to apply long-read sequencing
technologies with graph-based approaches to resolve the most complex regions in hundreds of human and
ape genomes. There are four aims: (1) determine the sequence structure of these recent duplications in
humans by generating complete high-quality reference sequences by coupling orthogonal long-read
sequencing technologies; (2) understand the genetic diversity of this structure by focusing on the most
dynamic and problematic gene-rich regions in more than 350 human and a diversity of non-human ape
genomes; (3) generate matched DNA and RNA long-read data to explore the transcriptional potential and
epigenetic features of segmental duplications in the human genome; and (4) develop a graph-based genotyper
specifically optimized to assay copy number polymorphic duplicated loci in short-read whole genome sequence
data allowing their diversity to be explored more systematically. This work will provide fundamental new
insights into the structural complexity of human genomes and the mutational processes that have shaped
them. It will identify new copy-number polymorphic genes and their distribution among human populations as
well as our first assessment of how such genomic regions are regulated and lead to the emergence of new
genes. This research has the additional benefit that it will add new sequence to reference genomes, facilitate
more routine telomere-to-telomere assembly, and provide us with the ability to systematically explore genetic
variation of regions frequently overlooked as part of disease-association studies.

## Key facts

- **NIH application ID:** 10441973
- **Project number:** 2R01HG002385-21
- **Recipient organization:** UNIVERSITY OF WASHINGTON
- **Principal Investigator:** Evan Eichler
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $717,900
- **Award type:** 2
- **Project period:** 2001-09-21 → 2027-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10441973

## Citation

> US National Institutes of Health, RePORTER application 10441973, Long-read sequence and assembly of segmental duplications (2R01HG002385-21). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10441973. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*