Gene and Protein Annotation in Highly-Identical Segmental Duplictions

NIH RePORTER · NIH · F30 · $35,760 · view on reporter.nih.gov ↗

Abstract

Project Summary/Abstract: Genes in highly identical segmental duplications (SDs) play critical roles in human evolution and disease. SDs themselves mediate pathogenic duplications, deletions, and other rearrangements whose effects range from neurodevelopmental conditions like autism to syndromic congenital diseases. The genes contained within SDs, once duplicated, are fertile ground for adaptive tinkering, and may provide innovations that underlie the evolution of human-specific traits. However, the duplicate nature of these genes has always presented extra challenges to their study. They are found in regions of the genome that are some of the most difficult to sequence and assemble; they suffer from incomplete and inaccurate annotation due to the difficulty of correctly assigning and assembling sequenced fragments of transcripts; and related to this, for many duplicated genes it is not known if they are functional—i.e., if they encode a translated and functioning protein. This project seeks to annotate segmentally duplicated genes at the level of transcription and translation and proposes a strategy to address these challenges. We will leverage a haploid genome to better discriminate between highly identical copies of genome sequence, we will combine single-molecule long-read sequencing technology with a custom cDNA enrichment strategy to accurately determine transcription of SD genes, and we will take advantage of new developments in mass spectrometry technology to identify paralog-specific peptides and determine which of these genes are translated. The goal of this study is identify functional, protein-coding genes among segmentally duplicated regions of the human genome. The generalizable approach developed in this study can be applied to duplicated space in other genomes as well. These genes will serve as candidates for future studies of human evolution and disease. If successful, this study will shed enormous light onto one of the oldest and most challenging problems in the study of the human genome.

Key facts

NIH application ID
9837460
Project number
5F30HG009478-04
Recipient
UNIVERSITY OF WASHINGTON
Principal Investigator
Max Logan Dougherty
Activity code
F30
Funding institute
NIH
Fiscal year
2020
Award amount
$35,760
Award type
5
Project period
2016-12-16 → 2020-06-15