# Identification of novel pathogenic tandem repeat expansions using long read sequencing

> **NIH NIH R01** · ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI · 2020 · $624,063

## Abstract

Large expansions of tandemly repeated (TR) DNA sequences (eg. polyCAG) are known to underlie >30
different human neurological diseases, including Huntington’s disease, Fragile X, and Myotonic dystrophy. The
vast majority of known TR expansions are observed in adult onset degenerative neuromuscular disorders and
ataxia syndromes. Although significant recent advances have been made that enable TRs to be genotyped
from high-throughput sequencing, the methods currently used to sequence human genomes are unable to
identify TR expansions, as they only look at very short fragments of DNA. However, recent advances and
falling costs of sequencing technologies like Pacific Biosciences SMRT sequencing that generates much
longer reads hold the promise to detect previously undetected repeat expansions. Here we will perform whole
genome sequencing using combined Pacific Biosciences (PacBio) on a selected cohort of patients with
unsolved ataxias and Huntington’s-like disease, in which all known TR expansions and other mutations have
been excluded. Many of these samples come from multi-generation pedigrees with dominant inheritance that
show genetic anticipation and linkage information that localizes the pathogenic mutation to a subset of the
genome, thus representing an optimized cohort in which to search for unknown pathogenic TR expansions.
 In order to be able to identify TR expansions underlying human disease, it is first necessary to
characterize the spectrum of tandem repeat variation within the normal population. Using genomes of 26
individuals sequenced with PacBio, we will use a novel algorithms we have developed called MsPac and
PacMONSTR, to generate a survey of the size distribution of all TRs in the normal human genome. This will be
supplemented by TR genotypes generated by HipSTR from 1,500 Illumina genomes. This information will
provide a baseline survey of TR variation that will allow us identify pathogenic TR expansions in samples with
ataxia and neurodegenerative disease, and as we show, also enables us to identify candidate TRs that are
likely to expand in human disease. Using this approach, we will first perform targeted genotyping of four
polyglutamine TRs that show strong signatures of instability in 250 samples with SCA/HD-phenocopies.
 We will next perform PacBio genome sequencing of 100 individuals from 40 pedigrees with unsolved
ataxia/HD-like disease, using a selected cohort of samples in which all known genetic and environmental
causes have already been excluded. We hypothesize that the mutation in some of these pedigrees will be
novel expanded TRs that have remained invisible to previous short-read approaches. We will search for novel
TR expansions not observed in our control population.
 Using this optimized cohort and novel hybrid long-read sequencing approach, this proposal will lead to
the identification of novel pathogenic TR expansions that underlie human neurological diseases, yielding
significant advances in our understandin...

## Key facts

- **NIH application ID:** 9983189
- **Project number:** 5R01NS105781-03
- **Recipient organization:** ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI
- **Principal Investigator:** Andrew James Sharp
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $624,063
- **Award type:** 5
- **Project period:** 2018-09-30 → 2023-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9983189

## Citation

> US National Institutes of Health, RePORTER application 9983189, Identification of novel pathogenic tandem repeat expansions using long read sequencing (5R01NS105781-03). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/9983189. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
