# Incorporating Analysis of Gene Paralog Variation Into Existing Genomics Datasets

> **NIH NIH R03** · UNIVERSITY OF MICHIGAN AT ANN ARBOR · 2020 · $312,000

## Abstract

Project Summary/Abstract
Gene duplication is a major mechanism for the evolution of novel gene functions. Copy-number and sequence
variation within multigene families are associated with many phenotypes, human diseases, and evolutionary
adaptations. Yet systematic incorporation of gene paralog variation into studies of genomic diversity is lacking.
Most existing tools are not well suited to delineating differences among gene family members or require
prohibitively large computational resources. We recently developed an approach, QuicK-mer2, which efficiently
estimates gene copy-number in a paralog specific manner. Application of our approach to data from the 1000
Genomes Project revealed rare gene-paralog variants that have not been previously reported. Here, we propose
application of QuicK-mer2 to create paralog specific copy-number estimates from existing NIH Common Fund
genomics data sets. In specific Aim 1, we will analyze genome sequencing data from the Genotype-Tissue
Expression (GTEx) consortium to define the effect of gene paralog variation on gene expression levels. Although
we will assess the entire genome, we will focus our analyses on variation among the largest family of transcription
factors, KRAB-ZFPs (Kruppel-related AB box zinc finger proteins), to identify trans-acting expression QTL. In
specific Aim 2, we will analyze variation among duplicated genes in the Gabriella Miller Kids First Data Resource
with a focus on structural birth defects, a phenotype to which copy-number variation is known to be a key
contributor. Many recurrent copy-number variants arise in regions which are flanked by large segments of
duplicated sequence with a high identity. Many of these regions of segmental duplication also contain members
of duplicated gene families that have important biological functions. Here, we will focus on discovering previously
missed gene copy number variation within the duplicated sequences themselves. Together, completion of these
aims will give a fuller picture of the extent of genomic variation and the impact of differences among gene
paralogs on gene regulation and disease.

## Key facts

- **NIH application ID:** 10104902
- **Project number:** 1R03OD030605-01
- **Recipient organization:** UNIVERSITY OF MICHIGAN AT ANN ARBOR
- **Principal Investigator:** Jeffrey M Kidd
- **Activity code:** R03 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $312,000
- **Award type:** 1
- **Project period:** 2020-09-18 → 2023-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10104902

## Citation

> US National Institutes of Health, RePORTER application 10104902, Incorporating Analysis of Gene Paralog Variation Into Existing Genomics Datasets (1R03OD030605-01). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10104902. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
