Systematic characterization of tandem repeat variants contributing to complex traits

NIH RePORTER · NIH · R01 · $638,253 · view on reporter.nih.gov ↗

Abstract

Genome-wide association studies (GWAS) have identified thousands of genetic loci linked to complex traits, but determining the causal variants, target genes, and biological mechanisms responsible for each signal remains challenging. While traditionally GWAS has focused on single nucleotide polymorphisms (SNPs), the advent of biobank-scale next-generation sequencing datasets as well as advanced functional genomics techniques now enable systematic interrogation of the role of more complex variants in polygenic traits. We focus on the role of genetic variation at repetitive regions of the genome. Specifically, we consider two repeat types: short tandem repeats (STRs), consisting of repeated motifs of 1-6bp, and variable number tandem repeats (VNTRs), with motifs of 7+bp, which we collectively refer to as tandem repeats (TRs). TRs exhibit rapid mutation rates that render them one of the largest sources of genetic variation in humans. Increasing evidence suggests that TRs act as an important source of causal variants for complex traits and may drive some of the strongest GWAS signals identified to date. Yet, due to bioinformatic and experimental challenges in studying repeats, the genome-wide role of TRs on complex human traits is only beginning to be uncovered. We hypothesize that TR variants are key drivers of complex traits. We recently published the first genome-wide integration of TRs into the GWAS framework. This identified 93 STRs predicted to causally impact blood and serum biomarker traits and estimated STRs explain 5-10% of GWAS signals for these traits. We have experimentally interrogated the effects of thousands of promoter TRs by optimizing a massively parallel reporter assays (MPRA) to enable studying low-complexity sequences. Using our MPRA we were able to show widespread and cell-type specific TR effects on expression. While these findings offer intriguing evidence that thousands of TRs contribute to human phenotypes, they have been limited by the range of TRs that could be accurately imputed into available GWAS datasets and the biological mechanisms by which TRs affect complex traits remains unknown in most cases. Here, we leverage (i) newly available whole genome sequencing (WGS) for hundreds of thousands of individuals from UK Biobank (UKB) and All of Us (AoU) which will enable direct TR genotyping rather than imputation, (ii) a suite of computational tools we have developed for population-scale TR analysis and association testing, and (iii) our recently developed MPRA and genome editing frameworks for experimental interrogation of TR effects to systematically evaluate the contribution of TRs to complex traits in humans. Using these, we will develop scalable methods to perform TR-based GWAS in large biobanks to generate a comprehensive catalog of TRs associated with complex traits (Aim 1), use MPRA to investigate the effects of tens of thousands of TRs on gene regulation (Aim 2) and perform deep characterization of candidate medi...

Key facts

NIH application ID
10982270
Project number
2R01HG010885-05
Recipient
UNIVERSITY OF CALIFORNIA, SAN DIEGO
Principal Investigator
Alon Goren
Activity code
R01
Funding institute
NIH
Fiscal year
2024
Award amount
$638,253
Award type
2
Project period
2020-09-17 → 2028-06-30