# Systematic characterization of tandem repeat variants contributing to complex traits

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA, SAN DIEGO · 2024 · $638,253

## Abstract

Genome-wide association studies (GWAS) have identified thousands of genetic loci linked to complex
traits, but determining the causal variants, target genes, and biological mechanisms responsible for each signal
remains challenging. While traditionally GWAS has focused on single nucleotide polymorphisms (SNPs), the
advent of biobank-scale next-generation sequencing datasets as well as advanced functional genomics
techniques now enable systematic interrogation of the role of more complex variants in polygenic traits.
 We focus on the role of genetic variation at repetitive regions of the genome. Specifically, we consider
two repeat types: short tandem repeats (STRs), consisting of repeated motifs of 1-6bp, and variable number
tandem repeats (VNTRs), with motifs of 7+bp, which we collectively refer to as tandem repeats (TRs). TRs exhibit
rapid mutation rates that render them one of the largest sources of genetic variation in humans. Increasing
evidence suggests that TRs act as an important source of causal variants for complex traits and may drive some
of the strongest GWAS signals identified to date. Yet, due to bioinformatic and experimental challenges in
studying repeats, the genome-wide role of TRs on complex human traits is only beginning to be uncovered.
 We hypothesize that TR variants are key drivers of complex traits. We recently published the first
genome-wide integration of TRs into the GWAS framework. This identified 93 STRs predicted to causally impact
blood and serum biomarker traits and estimated STRs explain 5-10% of GWAS signals for these traits. We have
experimentally interrogated the effects of thousands of promoter TRs by optimizing a massively parallel reporter
assays (MPRA) to enable studying low-complexity sequences. Using our MPRA we were able to show
widespread and cell-type specific TR effects on expression.
 While these findings offer intriguing evidence that thousands of TRs contribute to human phenotypes,
they have been limited by the range of TRs that could be accurately imputed into available GWAS datasets and
the biological mechanisms by which TRs affect complex traits remains unknown in most cases. Here, we
leverage (i) newly available whole genome sequencing (WGS) for hundreds of thousands of individuals from UK
Biobank (UKB) and All of Us (AoU) which will enable direct TR genotyping rather than imputation, (ii) a suite of
computational tools we have developed for population-scale TR analysis and association testing, and (iii) our
recently developed MPRA and genome editing frameworks for experimental interrogation of TR effects to
systematically evaluate the contribution of TRs to complex traits in humans. Using these, we will develop scalable
methods to perform TR-based GWAS in large biobanks to generate a comprehensive catalog of TRs associated
with complex traits (Aim 1), use MPRA to investigate the effects of tens of thousands of TRs on gene regulation
(Aim 2) and perform deep characterization of candidate medi...

## Key facts

- **NIH application ID:** 10982270
- **Project number:** 2R01HG010885-05
- **Recipient organization:** UNIVERSITY OF CALIFORNIA, SAN DIEGO
- **Principal Investigator:** Alon Goren
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $638,253
- **Award type:** 2
- **Project period:** 2020-09-17 → 2028-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10982270

## Citation

> US National Institutes of Health, RePORTER application 10982270, Systematic characterization of tandem repeat variants contributing to complex traits (2R01HG010885-05). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10982270. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
