# Multi-allelic forms of human genome structural variation

> **NIH NIH R56** · HARVARD MEDICAL SCHOOL · 2020 · $339,000

## Abstract

SUMMARY/ABSTRACT
Hundreds of thousands of human genomes are being sequenced, enabled by profound reduction in the costs of
sequencing. These data offer unprecedented opportunities to ascertain how the human genome varies and how
this variation shapes human biology. While fine-scale sequence variation is today readily recognized by mature
analysis methods, larger-scale forms of genome variation – especially those with many structurally distinct alleles
– are challenging to recognize, analyze, and incorporate into association analyses. We seek to understand how
human genomes vary at these scales and how this variation contributes to human phenotypes.
We believe that it is possible to ascertain far more genetic variation in genome sequence data than is visible with
analysis methods today. There is vast under-utilized information in the statistical patterns that large collections
of sequence reads form across individuals, families and populations, and in further utilizing the haplotypes that
multi-allelic variants form together with SNPs and other variants. Our focus in this work will be on two large,
intriguing classes of genome variation that we seek to incorporate into routine genome analysis. One class
involves multi-allelic CNVs, in which a genomic segment (from one to several hundred kilobases size) exists in
a wide range of copy numbers (such as 2–10) per diploid human genome, often varying in fine-scale sequence
as well as copy number. Another class involves higher-copy-number variable-number-of-tandem-repeat (VNTR)
polymorphisms, in which a shorter genomic sequence (tens to thousands of base pairs) exists in a wider range
of copy numbers (up to scores or even hundreds of copies) per diploid genome. We will advance analysis
methods that make it possible to measure sequence variation at these loci, identify the structural alleles from
which this variation arises, and analyze the relationships of such variation to human phenotypes. We will create
and distribute research software and data resources, such as reference haplotypes, that enable human
geneticists to incorporate such loci into association and fine-mapping analyses. We will also assess the
contribution of these kinds of variation to quantitative phenotypes that are being collected in large population
cohorts.
We hope that this work contributes to many discoveries about the genetic and biological basis of disease.

## Key facts

- **NIH application ID:** 10192865
- **Project number:** 2R56HG006855-09
- **Recipient organization:** HARVARD MEDICAL SCHOOL
- **Principal Investigator:** Steven Andrew McCarroll
- **Activity code:** R56 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $339,000
- **Award type:** 2
- **Project period:** 2012-08-18 → 2021-08-12

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10192865

## Citation

> US National Institutes of Health, RePORTER application 10192865, Multi-allelic forms of human genome structural variation (2R56HG006855-09). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10192865. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
