# IMPROVING THROUGHPUT OF LONG READS WITH HIGH CONSENSUS BASE ACCURACY TO RESOLVE REPETITIVE DNAS

> **NIH NIH R21** · UNIVERSITY OF CALIFORNIA SANTA CRUZ · 2020 · $306,007

## Abstract

Project Summary/Abstract
Approaches to complete the human genome will benefit from careful, benchmarked advances
that demonstrate the capability to fully assemble and phase diploid chromosomes. The
remaining unresolved regions in our high-resolution genomic maps are known to contain long
tracts of repeats. The long-term objective of our research is to develop new experimental
methods to complete chromosome scale assemblies to study the sequence organization,
structural diversity, and disease impact of these novel sequences. The goal of this proposal is to
develop sequencing methods to improve the sequence throughput of reads that are hundreds of
kilobases in length to improve consensus base accuracy, a necessary step to advance
assembly efforts into the remaining gapped regions. In our first aim, we demonstrate the use of
our approach to generate the first telomere-to-telomere phased assembly of a human
chromosome. We hypothesize that this work will be critical to complete other chromosome
reference assemblies. In our second aim we present a new approach to target and enrich for
long-reads in repetitive DNAs that were previously misrepresented or missing completely in
previous reference assemblies. This approach is expected to provide a new cost-effective
method of studying genetic variation in these highly variable regions from a large number of
individuals. This research has the additional benefit that it will add new sequence to the human
genome to systematically explore genetic variation of regions frequently overlooked as part of
disease-association studies.

## Key facts

- **NIH application ID:** 9920185
- **Project number:** 5R21HG010548-02
- **Recipient organization:** UNIVERSITY OF CALIFORNIA SANTA CRUZ
- **Principal Investigator:** Karen Hayden Miga
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $306,007
- **Award type:** 5
- **Project period:** 2019-04-23 → 2023-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9920185

## Citation

> US National Institutes of Health, RePORTER application 9920185, IMPROVING THROUGHPUT OF LONG READS WITH HIGH CONSENSUS BASE ACCURACY TO RESOLVE REPETITIVE DNAS (5R21HG010548-02). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/9920185. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
