# Scalable tool and comprehensive maps to interpret structural variation across the neuropsychiatric spectrum

> **NIH NIH R01** · BROAD INSTITUTE, INC. · 2022 · $785,192

## Abstract

ABSTRACT
Structural variation (SV) is a major driver of genome organization, content, and diversity. Over the last decade,
many studies have demonstrated the significance of SV to the genetic architecture of neuropsychiatric disorders
(NPDs) such as autism spectrum disorder (ASD), schizophrenia, bipolar disorder, and ADHD. These studies
have suggested a significant impact of SV within individual disorders, as well as shared genetic etiology across
a spectrum of NPDs. However, despite this etiological relevance, most studies of SV in NPDs have focused on
large canonical copy number variation (CNV) using microarray technologies. Population genetic studies have
paralleled these efforts, as most SV databases are dominated by array-based CNV data. Several whole-genome
sequencing (WGS) references have now been created to characterize SV, such as the 1000 Genomes Project
in ~2,500 individuals. These datasets have been invaluable to human genetic research; however, they have
captured a small fraction of SV that is accessible to WGS and are limited in ancestral diversity, primarily due to
limitations in technologies, algorithms, and sample sizes. These challenges have also reduced the value of these
reference for clinical interpretation of SV in diagnostic screening. This study will provide maps of canonical and
complex SVs on a scale >50-fold that of the 1000 Genomes Project by systematically analyzing aggregated
WGS datasets in the genome aggregation consortium (gnomAD). We will integrate our completed prototype of
a scalable tool for cloud-based SV discovery within the universally accessible Genome Analysis Toolkit (GATK-
SV; Aim 1). GATK-SV will provide an open source framework that can capture a spectrum of canonical and
complex SV, within the capabilities of short-read WGS, and will include a module for extensibility to long-read
WGS. We will apply these methods across the aggregation of diverse ancestries in gnomAD, a WGS extension
of our Exome Aggregation Consortium (ExAC) (Aim 2). The gnomAD dataset currently includes 85,000 WGS
samples, and this resource will exceed 150,000 genomes by the conclusion of Aim 2. We will use this reference
to define genomic regions recalcitrant to SV and provide systematic measures of SV constraint. We will then
perform WGS association analyses across >60,000 genomes in individuals with NPDs, including ASD,
schizophrenia, and bipolar disorder cases (Aim 3). In combination with the gnomAD SV maps and the integration
of microarray-based CNV aggregation, these analyses will be well powered to quantify the relative risk conferred
by SV in each individual disorder, and to explore shared risk across the NPD spectrum. Each aim will apply
innovative approaches to yield novel products, and we will freely distribute these tools, maps, and analyses
without restriction. Importantly, these data will also provide benchmarked references for diagnostic interpretation
across diverse ancestries, and an analytical framework for future...

## Key facts

- **NIH application ID:** 10414009
- **Project number:** 5R01MH115957-04
- **Recipient organization:** BROAD INSTITUTE, INC.
- **Principal Investigator:** MICHAEL E TALKOWSKI
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $785,192
- **Award type:** 5
- **Project period:** 2019-08-02 → 2023-06-14

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10414009

## Citation

> US National Institutes of Health, RePORTER application 10414009, Scalable tool and comprehensive maps to interpret structural variation across the neuropsychiatric spectrum (5R01MH115957-04). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10414009. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*