# Structurally complex genome loci in human populations and human phenotypes

> **NIH NIH R01** · HARVARD MEDICAL SCHOOL · 2024 · $710,192

## Abstract

SUMMARY/ABSTRACT
Structurally complex loci (SCLs) are hotspots of genome dynamism whose relationship to human phenotypic
variation is unknown. SCLs have multiple segments of duplicated DNA sequence which can contain or flank
genes, exons, or regulatory elements; these repeated sequences recombine with one another to generate new
alleles by non-allelic homologous recombination and gene conversion, creating many functionally distinct
alleles with different gene dosages and/or protein structures.
Human genetics does not yet know the alleles that are present at most SCLs, nor their relationship to human
phenotypic variation. Genetic variation at SCLs tends to arise from many alleles, to be hard to assemble, and to
have complex relationships to nearby SNPs and SNP haplotypes. Yet SCLs provide a real opportunity to relate
phenotypes to allelic series of functional alleles with interpretable effects on gene dosage or protein domain
structure.
In this work, we will develop ways to ascertain how SCLs at loci across the genome are comprised of allelic
series and relate to a diverse set of human phenotypes. To do this, we will combine data from many forms of
genome analysis – definitive long-read data (n ~102 and growing), whole-genome and whole-exome sequence
data (104-105) and SNP array data (105-107) with companion phenotype data.
In Aim 1, we will develop methods to reveal the full spectrum of variation at SCLs. We will (a) identify variable
DNA features and the ways in which these features vary and co-distribute across thousands of people of
diverse ancestries, and (b) find the underlying alleles and allele frequencies that explain this population-scale
variation.
In Aim 2, we will enable powerful genotype-phenotype analyses that leverage vast existing SNP data sets; we
will do this by creating large panels of reference haplotypes of SCL alleles and surrounding SNPs, and
advancing methods for imputing SCL alleles into SNP data.
In Aim 3, we will advance approaches for genetic association analysis and fine-mapping at SCLs, and explore
the functional consequences of SCLs on quantitative traits and disease risk.
We aspire to make and enable many more discoveries about how allelic series at structurally complex loci shape
human phenotypes.

## Key facts

- **NIH application ID:** 10853061
- **Project number:** 5R01HG006855-12
- **Recipient organization:** HARVARD MEDICAL SCHOOL
- **Principal Investigator:** Steven Andrew McCarroll
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $710,192
- **Award type:** 5
- **Project period:** 2012-08-18 → 2027-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10853061

## Citation

> US National Institutes of Health, RePORTER application 10853061, Structurally complex genome loci in human populations and human phenotypes (5R01HG006855-12). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10853061. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
