# Gene and Protein Annotation in Highly-Identical Segmental Duplictions

> **NIH NIH F30** · UNIVERSITY OF WASHINGTON · 2020 · $35,760

## Abstract

Project Summary/Abstract:
Genes in highly identical segmental duplications (SDs) play critical roles in human evolution and disease. SDs
themselves mediate pathogenic duplications, deletions, and other rearrangements whose effects range from
neurodevelopmental conditions like autism to syndromic congenital diseases. The genes contained within
SDs, once duplicated, are fertile ground for adaptive tinkering, and may provide innovations that underlie the
evolution of human-specific traits.
However, the duplicate nature of these genes has always presented extra challenges to their study. They are
found in regions of the genome that are some of the most difficult to sequence and assemble; they suffer
from incomplete and inaccurate annotation due to the difficulty of correctly assigning and assembling
sequenced fragments of transcripts; and related to this, for many duplicated genes it is not known if they are
functional—i.e., if they encode a translated and functioning protein.
This project seeks to annotate segmentally duplicated genes at the level of transcription and translation and
proposes a strategy to address these challenges. We will leverage a haploid genome to better discriminate
between highly identical copies of genome sequence, we will combine single-molecule long-read sequencing
technology with a custom cDNA enrichment strategy to accurately determine transcription of SD genes, and
we will take advantage of new developments in mass spectrometry technology to identify paralog-specific
peptides and determine which of these genes are translated.
The goal of this study is identify functional, protein-coding genes among segmentally duplicated regions of
the human genome. The generalizable approach developed in this study can be applied to duplicated space
in other genomes as well. These genes will serve as candidates for future studies of human evolution and
disease. If successful, this study will shed enormous light onto one of the oldest and most challenging
problems in the study of the human genome.

## Key facts

- **NIH application ID:** 9837460
- **Project number:** 5F30HG009478-04
- **Recipient organization:** UNIVERSITY OF WASHINGTON
- **Principal Investigator:** Max Logan Dougherty
- **Activity code:** F30 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $35,760
- **Award type:** 5
- **Project period:** 2016-12-16 → 2020-06-15

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9837460

## Citation

> US National Institutes of Health, RePORTER application 9837460, Gene and Protein Annotation in Highly-Identical Segmental Duplictions (5F30HG009478-04). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/9837460. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
