# High throughput single molecule approaches for phased genome sequence assembly

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA, SAN FRANCISCO · 2020 · $650,000

## Abstract

Project Summary/Abstract
It is possible to combine technologies based on single molecules to achieve de novo genome
sequence assembly with phasing and genome-wide structural variation identification. De novo
assembled whole genome sequencing fully describes the diploid human genome except for a
small number of long, highly repetitive sequences such as the centromeres, telomeres, and
near-identical segmental duplications. Key to the success of phased genome sequence
assembly is the single molecule mapping approach originally developed by our group and is
now improved by Bionano Genomics. The method starts with sequence-specific labeling of long
(180 kb to >1 Mb), double-stranded genomic DNA fragments with fluorophores followed by high-
throughput, automated imaging and analysis of the linearized fluorescent DNA molecules in
nanochannel arrays on a commercially available instrument. During the next phase of this
project, we propose to produce phased genome sequence assemblies of 2 individuals from
each of all 26 ethnic groups of the 1000 Genomes Project to serve as general references for the
community. In addition, we will further develop the single molecule labeling technology to map
repetitive elements that are difficult to interrogate genome-wide and to precisely phase long-
range target regions. The approach we are taking to construct de novo phased and assembled
genomes will produce “near reference grade” genomes with high efficiency and at low cost for
many ethnic groups around the world. These reference sequences will increase substantially
the value of all the whole genome sequences already obtained and provide further insight into
structural variation patterns across human populations. The technology development aims of
this proposal will address some of the most difficult questions facing genome analysis today. At
the end of this four-year project, a robust method for phased genome assembly, repetitive
sequence mapping, and long-range phasing will be developed and ready for application in many
areas of genome research.

## Key facts

- **NIH application ID:** 9952403
- **Project number:** 5R01HG005946-09
- **Recipient organization:** UNIVERSITY OF CALIFORNIA, SAN FRANCISCO
- **Principal Investigator:** Pui-Yan KWOK
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $650,000
- **Award type:** 5
- **Project period:** 2010-09-27 → 2022-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9952403

## Citation

> US National Institutes of Health, RePORTER application 9952403, High throughput single molecule approaches for phased genome sequence assembly (5R01HG005946-09). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9952403. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
