# Development and application of a pipeline to identify tandem repeat expansions as a cause of Parkinson's disease in the AMP PD cohort

> **NIH NIH U01** · ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI · 2020 · $506,627

## Abstract

Tandem Repeat Expansions (TREs), most commonly of triplet repeats such as poly(CAG), are known to
underlie >30 different human neurological diseases. While the majority of TREs identified to date have been
found in late-onset neuro-degenerative disorders such as hereditary ataxias and Huntington disease, TREs
have been identified in patients with PD. However, despite this evidence that variation in tandem repeat (TR)
sequences can act as the causative mutations in some cases of PD, there have been no concerted efforts in
PD cohorts to either systematically screen for novel TREs, or to genotype VNTR copy numbers.
 Newly developed bioinformatic approaches that can be applied to analyze Whole Genome Sequencing
(WGS) data now provide an opportunity to fill this knowledge gap. Utilizing the expertise and knowledge that
we have gained working on other large datasets, in this proposal, we will first develop end-to-end efficient
analytical pipelines for analyzing short-read WGS data using the STRetch and gangSTR algorithms that can
be deployed on the cloud. These will allow the rapid and cost-effective identification of TREs in any cohort of
interest. We will then apply these pipelines to analyze samples with whole genome sequencing data available
from the AMP PD database and Terra to identify novel TREs that are likely causal for PD. Finally, we will
perform validation experiments of predicted TREs that are likely causal for PD.
 1. Our primary goal is to develop comprehensive and efficient analytical pipelines based on the
 bioinformatic tools STRetch and gangSTR to enable the identification of tandem repeat expansions in
 short-read genome sequencing data using the cloud platform Terra.
 2. Our secondary goal is to identify TREs that show potential causative associations with PD using whole
 genome sequencing data from 6,206 PD patients and controls. We will apply our optimized bioinformatic
 pipeline to analyze available WGS data in the AMP PD database, supplemented with data from additional
 control samples from The 1000 Genomes Project.
 3. Finally, we will attempt to validate putative TREs detected in Aim 2 that show associations with PD
 using DNA samples that are available from the participating AMP PD biobanks.
Given that TREs represent established mutational mechanisms that contribute to a variety of late-onset neuro-
degenerative conditions, we believe that the study of TR variation in PD represents a logical step that has a
high likelihood of uncovering novel genetic causes of PD.

## Key facts

- **NIH application ID:** 10129646
- **Project number:** 1U01NS120241-01
- **Recipient organization:** ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI
- **Principal Investigator:** Andrew James Sharp
- **Activity code:** U01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $506,627
- **Award type:** 1
- **Project period:** 2020-09-30 → 2021-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10129646

## Citation

> US National Institutes of Health, RePORTER application 10129646, Development and application of a pipeline to identify tandem repeat expansions as a cause of Parkinson's disease in the AMP PD cohort (1U01NS120241-01). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10129646. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
