Revealing new short tandem repeat variation in the human population across sequencing technologies: towards rare disease diagnosis and discovery

NIH RePORTER · NIH · K99 · $120,455 · view on reporter.nih.gov ↗

Abstract

Abstract Short tandem repeats (STRs) are 1–6 bp repetitive and highly polymorphic DNA sequences. Expansions in dozens of STRs are associated with genetic disease. However, STRs are challenging to sequence and interpret, meaning that individuals with STR disease often go undiagnosed. In rare disease studies, it is now standard to prioritize candidate pathogenic SNVs, indels and SVs by excluding variants that have high allele frequencies in population-scale databases such as gnomAD. However, there is no such genome-wide database available for large STR expansions. I will produce a publicly available STR variation community resource, stratified by ancestry, to enable prioritization of candidate pathogenic STR expansions. Long-read sequencing technologies from PacBio and Nanopore have been heralded as the solution to accurately genotype long repeats because their reads can span the repetitive region. However, there are several challenges when genotyping STRS in long-reads that are not adequately addressed by existing approaches. I will develop a method to genotype STRs from long-read Oxford Nanopore sequencing data. It will discover informative reads using a combination of alignment and identifying repetitive regions in reads. It will then infer the genotype by integrating evidence from multiple reads, informed by my investigation of biases in these technologies. Drawing together new short and long-read computational approaches to calling STR expansions, and my population-scale STR catalog, with an emphasis on diverse and under-served populations, this proposal will establish a genetic diagnosis for hundreds of patients, while searching for new STR disease loci. I will analyze patient cohorts enriched for phenotypes associated with STRs from the UDN, University of Washington, Harry Perkins Institute of Medical Research and Children’s Mercy Hospital to solve cases and discover new disease- associated STRs in both short and long-read sequencing.

Key facts

NIH application ID: 10768689
Project number: 5K99HG012796-02
Recipient: UTAH STATE HIGHER EDUCATION SYSTEM--UNIVERSITY OF UTAH
Principal Investigator: Harriet Dashnow
Activity code: K99
Funding institute: NIH
Fiscal year: 2024
Award amount: $120,455
Award type: 5
Project period: 2023-02-01 → 2024-07-01