# Detection and genotyping complex human genetic variation using single-molecule sequencing

> **NIH NIH R01** · UNIVERSITY OF SOUTHERN CALIFORNIA · 2021 · $412,500

## Abstract

Project summary
Although single-molecule sequencing (SMS) technologies have advanced in recent years to enable routine
sequencing and assembly of human genomes, new software is required to utilize the potential of SMS in human
genetics. The long term goal is to help improve our understanding of complex variation in human diversity and
its role in disease. To achieve this, we will develop methods to (1) detect variation in SMS reads, (2) assemble
duplicated sequences missing from SMS de novo assemblies, and (3) genotype complex variation in large HTS
datasets using lightweight data structures. While several years of algorithm development for SMS data have
resulted in an software ecosystem to detect variation in SMS genomes, the rationale for the need to continue
development is that sensitivity and specificity are not yet sufficient for disease studies, important classes of
variation are not resolved by current assembly approaches, and the knowledge gained from sequencing SMS
genomes must be used to improve what can be discovered in large disease studies that rely heavily on short
read data such as those conducted under TOPMed. The algorithmic innovations we will provide for SMS data
are an alignment algorithm that explicitly optimizes over rearranged sequences, an assembly approach that
exploits minor differences between duplication copies to resolve genome function. Software will be supported
through Bioconda installation and distributed test cases. Once a variant is discovered by SMS, it may be more
easily genotyped in short read data. We will develop methods to generate databases of SMS variation that may
be queried with short read data. To aid in development of assembly algorithms for duplicated sequences, we will
generate a public resource of SMS data for individuals with known copy number polymorphisms. The significance
of this work is to enable SMS genomes to be used in disease studies, both by uncovering previously hidden
variation, and by increasing the amount of variation found in large short-read datasets.

## Key facts

- **NIH application ID:** 10186109
- **Project number:** 1R01HG011649-01
- **Recipient organization:** UNIVERSITY OF SOUTHERN CALIFORNIA
- **Principal Investigator:** Mark Chaisson
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $412,500
- **Award type:** 1
- **Project period:** 2021-07-15 → 2026-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10186109

## Citation

> US National Institutes of Health, RePORTER application 10186109, Detection and genotyping complex human genetic variation using single-molecule sequencing (1R01HG011649-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10186109. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*