# Systematic characterization of tandem repeat variants contributing to complex traits

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA, SAN DIEGO · 2020 · $704,999

## Abstract

SUMMARY ABSTRACT
 Genome-wide association studies (GWAS) have identified thousands of genetic loci associated with
complex traits, but determining the causal variants, target genes, and biological mechanisms responsible for
each signal has proven challenging. Furthermore, standard GWAS based on single nucleotide polymorphisms
(SNPs) have been limited by failure to explain the majority of heritability for most traits studied and an inability to
capture multi-allelic variants such as copy number variants (CNVs) and repeats not tagged by SNPs.
 We focus on the role of genetic variation at repetitive regions of the genome. Specifically, we consider
two repeat types: short tandem repeats (STRs), consisting of repeated motifs of 1-6bp; and variable number
tandem repeats (VNTRs), with motifs of 7+bp. We collectively refer to STRs and VNTRs as tandem repeats
(TRs). TRs encompass approximately 2 million loci comprising over 3% of the genome. They exhibit rapid
mutation rates and are one of the largest sources of genetic variation. Growing evidence suggests that TRs are
likely to account for part of the “missing heritability” of GWAS. However, due to bioinformatic and experimental
challenges in studying repeats, the genome-wide role of TRs in human traits remains mostly unexplored.
 We hypothesize that TR variants are key drivers of complex traits. We recently identified thousands
of STRs predicted to causally regulate gene expression (termed expression STRs, or eSTRs) and revealed that
eSTRs potentially act through a variety of mechanisms including modulating nucleosome positioning and DNA
or RNA secondary structure. We additionally identified specific eSTRs likely underlying published GWAS signals
for height and schizophrenia. Furthermore, other groups have recently discovered TRs as causal drivers of
complex traits including malaria resistance, cancer risk, and bipolar disorder.
 While these findings offer intriguing evidence that thousands of TRs contribute to human phenotypes,
they have several limitations. These include: the range of TRs that can be accurately genotyped from next-
generation sequencing (NGS); a lack of sufficiently large NGS datasets for most traits for performing association
analyses; and limited understanding of the potential mechanisms by which TRs participate in gene regulation.
Here, we leverage (i) our recently developed TR genotyping tools and (ii) our published haplotype panel allowing
imputation of TRs into available SNP-array datasets, to systematically evaluate the contribution of TRs to gene
regulation and complex traits in humans. We will first generate a comprehensive catalog of TRs associated with
gene regulation (Aim 1) and establish a framework for validating TR effects using massively parallel reporter
assays and genome editing (Aim 2). We will then impute more than 2 million TRs into large existing GWAS
datasets and perform fine-mapping to identify TRs associated with a range of complex traits and deeply
characteri...

## Key facts

- **NIH application ID:** 10052847
- **Project number:** 1R01HG010885-01A1
- **Recipient organization:** UNIVERSITY OF CALIFORNIA, SAN DIEGO
- **Principal Investigator:** Alon Goren
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $704,999
- **Award type:** 1
- **Project period:** 2020-09-17 → 2024-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10052847

## Citation

> US National Institutes of Health, RePORTER application 10052847, Systematic characterization of tandem repeat variants contributing to complex traits (1R01HG010885-01A1). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10052847. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
