# Refining Mendelian disease analysis via detection of clinically relevant repeat variants

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA, SAN DIEGO · 2024 · $564,751

## Abstract

PROJECT SUMMARY
 Whole genome sequencing (WGS) has the potential to profile all clinically relevant genetic variants
simultaneously. However, clinical variant discovery pipelines have focused largely on coding single nucleotide
variants (SNVs), and to a lesser extent on regulatory SNVs and small indels, ignoring more complex classes of
pathogenic variants such as repeats or structural rearrangements.
 Repeats can take many forms, but we consider three classes of repeats: short tandem repeats (STRs),
variable number tandem repeats (VNTRs), and low-copy repeats or segmental duplications, together
accounting for more than 8% of the human genome. These variant classes have been implicated in a number
of Mendelian diseases. More than 30 disorders, primarily neurodegenerative, are caused by STR expansions,
including Huntington’s Disease, Fragile X Syndrome, ALS/FTD, and hereditary ataxias. Similarly, VNTRs have
been implicated in a range of psychiatric and other traits including medullary cystic kidney disease and type 1
diabetes. In many cases, the disease progression is correlated with germline repeat counts, but sequence
variation within individual repeat units, and somatic instability of repeat length, has also been shown to be
pathogenic in some cases. Finally, mutations in more than 100 duplicated genes have been implicated in rare
Mendelian disorders and cancer, including PMS2 in Lynch Syndrome and STRC in hearing loss. Taken
together, diseases associated with these repeat classes affect millions of individuals worldwide.
 Despite their relevance to disease, these repeat types are typically absent from sequence analysis
pipelines due to the bioinformatics challenges they present. Over the last several years, we and others have
made significant progress in developing methods to analyze clinically relevant repeats from short reads.
However, important challenges remain, including the ability to genotype long, complex, imperfect, or GC rich
repeats, to infer clinically relevant somatic variation, and the computational burden of existing methods.
Further, existing frameworks for predicting the pathogenicity of individual SNVs or indels are not applicable to
most repeats, and thus there is a need for prioritization methods to predict the impact of new repeat variants.
 The goal of this project is to make repeat analysis a standard component of existing Mendelian
variant calling pipelines. To this end, we will develop novel methods for profiling repeat variants from long
reads (Aim 1), extend our existing methods for short reads to consider more complex variant types (Aim 2),
and establish a framework for prioritization of pathogenic repeat mutations (Aim 3).

## Key facts

- **NIH application ID:** 10825532
- **Project number:** 5R01HG010149-06
- **Recipient organization:** UNIVERSITY OF CALIFORNIA, SAN DIEGO
- **Principal Investigator:** Vineet Bafna
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $564,751
- **Award type:** 5
- **Project period:** 2018-09-14 → 2027-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10825532

## Citation

> US National Institutes of Health, RePORTER application 10825532, Refining Mendelian disease analysis via detection of clinically relevant repeat variants (5R01HG010149-06). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10825532. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
