Sequence and Assembly of Segmental Duplications

NIH RePORTER · NIH · R01 · $608,375 · view on reporter.nih.gov ↗

Abstract

Despite the high quality of the human genome, important gaps remain in our understanding of its sequence organization, function, and variation. Our genome is particularly enriched for interspersed segmental duplications, which harbor rapidly evolving genes and predispose our species to recurrent rearrangements associated with disease. The long-term objective of our research has been to develop computational and experimental methods to understand the organization, genetic diversity, and disease impact of segmental duplications. The goal of this competing renewal is to begin to understand the function and variation of the duplicated genes themselves. We propose to focus here on human- and great ape-specific gene families mapping within the most complex and duplicated regions of our genome. There are four aims: (1) determine the sequence structure of these recent duplications by generating high-quality reference sequences using clone-based resources and long-read sequencing technologies; (2) understand the genetic diversity of this structure focusing on those that have most likely been targets of selection; (3) completely annotate the gene content to distinguish protein-encoding innovations from pseudogenes; and (4) test for neurodevelopmental disease association by comparing the burden of loss-of-function mutations in patients versus controls using available genome sequence data and molecular inversion probe assays. We hypothesize that segmental duplications have played an important role in human neurocognitive adaptation and that patterns of copy number polymorphisms and substitution will differ significantly between functional and nonfunctional paralogs. This research has the additional benefit that it will add new sequence to reference genomes, identify missing genes, and provide us with the ability to systematically explore genetic variation of regions frequently overlooked as part of disease-association studies.

Key facts

NIH application ID
9933973
Project number
5R01HG002385-19
Recipient
UNIVERSITY OF WASHINGTON
Principal Investigator
Evan Eichler
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$608,375
Award type
5
Project period
2001-09-21 → 2022-04-30