Title Creating an advanced multi-ancestral resource and tools for short tandem repeat analysis in the AoURP researcher workbench Abstract The AoURP researcher workbench provides an unparalleled opportunity to study multi-ancestral human genome variation at scale with the promise of benefitting health and diseases for all people in the USA. A particular type of variation, only recently amenable to bioinformatic analysis due to breakthroughs in software development and long-read sequencing, are tandem repeats (TRs). TRs, when unstable and expanded, have been linked to disease and it is widely expected that they will play a much larger role for health and disease going forward. We have been developing TR resources, machine learning based tools (RExPRT), and discovered novel disease-causing TRs published in Nature Genetics and NEJM in the past 5 years. The co-PI’s of this proposal recently worked as part of the AoURP Long Reads Working Group to create a call set of over 3.5 billion TR alleles from 1,027 AoURP participants sequenced with PacBio HiFi reads using the newly developed TRGT tool. This grant application proposes to characterize this recently established AoURP data resource by developing novel analytical tools to unearth interrelated patterns of tandem repeat length, motif, and flanking variation in unprecedented detail. In addition to this long-read based data resource, we will also characterize the TRs in the larger cohort of >250,000 Illumina short-read whole genomes produced by the AoURP, though in lesser detail, using tools such as ExpansionHunter. The product of this work will be available to all workbench users in the form of normative databases, tools, notebooks, and scripts to accelerate the study of TRs in the context of health records data. We believe this work will enable prioritization of disease-causing TR loci and lead to a better understanding of TR biology which will be vital to the development of new therapeutics.