Development and application of a pipeline to identify tandem repeat expansions as a cause of Parkinson's disease in the AMP PD cohort

NIH RePORTER · NIH · U01 · $506,627 · view on reporter.nih.gov ↗

Abstract

Tandem Repeat Expansions (TREs), most commonly of triplet repeats such as poly(CAG), are known to underlie >30 different human neurological diseases. While the majority of TREs identified to date have been found in late-onset neuro-degenerative disorders such as hereditary ataxias and Huntington disease, TREs have been identified in patients with PD. However, despite this evidence that variation in tandem repeat (TR) sequences can act as the causative mutations in some cases of PD, there have been no concerted efforts in PD cohorts to either systematically screen for novel TREs, or to genotype VNTR copy numbers. Newly developed bioinformatic approaches that can be applied to analyze Whole Genome Sequencing (WGS) data now provide an opportunity to fill this knowledge gap. Utilizing the expertise and knowledge that we have gained working on other large datasets, in this proposal, we will first develop end-to-end efficient analytical pipelines for analyzing short-read WGS data using the STRetch and gangSTR algorithms that can be deployed on the cloud. These will allow the rapid and cost-effective identification of TREs in any cohort of interest. We will then apply these pipelines to analyze samples with whole genome sequencing data available from the AMP PD database and Terra to identify novel TREs that are likely causal for PD. Finally, we will perform validation experiments of predicted TREs that are likely causal for PD. 1. Our primary goal is to develop comprehensive and efficient analytical pipelines based on the bioinformatic tools STRetch and gangSTR to enable the identification of tandem repeat expansions in short-read genome sequencing data using the cloud platform Terra. 2. Our secondary goal is to identify TREs that show potential causative associations with PD using whole genome sequencing data from 6,206 PD patients and controls. We will apply our optimized bioinformatic pipeline to analyze available WGS data in the AMP PD database, supplemented with data from additional control samples from The 1000 Genomes Project. 3. Finally, we will attempt to validate putative TREs detected in Aim 2 that show associations with PD using DNA samples that are available from the participating AMP PD biobanks. Given that TREs represent established mutational mechanisms that contribute to a variety of late-onset neuro- degenerative conditions, we believe that the study of TR variation in PD represents a logical step that has a high likelihood of uncovering novel genetic causes of PD.

Key facts

NIH application ID
10129646
Project number
1U01NS120241-01
Recipient
ICAHN SCHOOL OF MEDICINE AT MOUNT SINAI
Principal Investigator
Andrew James Sharp
Activity code
U01
Funding institute
NIH
Fiscal year
2020
Award amount
$506,627
Award type
1
Project period
2020-09-30 → 2021-08-31