# Robust and cost-effective computational methods for haplotype-resolved genome assemblies

> **NIH NIH R00** · YALE UNIVERSITY · 2024 · $249,000

## Abstract

Abstract
Background: De nova haplotype-resolved genome assembly not only plays a critical role in the studies of novel species,
but also is the most comprehensive solution to discover structural variants and understand repeat-rich regions of the human
genome. Moreover, haplotype-resolved assemblies are the fundamental infrastructures for various pangenome references.
Recent advances in accurate long-read sequencing technologies open the opportunity to faithfully build high-quality haplotyperesolved
assemblies, but most assembly algorithms could not take full advantage of the emerging accurate long-read data.
To this end, I have developed a graph-based haplotype-resolved genome assembly algorithm, called hifiasm, which combines
accurate long reads with the additional data providing long-range phasing information. Hifiasm has been widely used by
multiple large-scale sequencing projects, such as the Human Pangenome Reference Consortium (HPRC), the Genome in a
Bottle (GIAB), the Vertebrate Genomes Project (VGP), and the Darwin Tree of Life project. Based on hifiasm, this proposal
focuses on developing a set of new haplotype-resolved assembly algorithms to further improve the assembly quality for
complex regions and genomes, as well as substantially reduce the assembly cost.
Research: My first aim is to develop a hybrid algorithm to produce high-quality haplotype-resolved assemblies for diploid
genomes, especially focusing on resolving highly repetitive regions like centromeres. The proposed algorithm will combine
the advantages of length and accuracy from different types of long-read data to automatically reconstruct the last unexplored
repeat-rich regions of the genome. In the second aim, I will develop a haplotype-aware scaffolding algorithm to achieve
chromosome-level haplotype-resolved assemblies for diploid genomes. In the third aim, I will propose different strategies to
reduce the sequencing cost and the computational cost of the haplotype-resolved assembly, making it feasible for populationscale
studies. I will also develop assembly algorithms to resolve complex genomes including not only two haplotypes. Upon
completion, the proposed studies will offer efficient assembly tools for large-scale sequencing projects, and will pave the way
to personal genome assembly for genomic research and clinical applications.
Career development and training: My long-term career goal is to lead an independent research group focusing on
developing novel computational methods for haplotype-resolved assemblies and the relevant applications. In addition to
further enhancing my training in computational method development with my mentor Dr. Heng Li, I will obtain systematic
training in biomedical research from the advisory committee (Dr. Erich D. Jarvis and Dr. Scott V. Edwards for human and
non-human genomes, Dr. Evan E. Eichler and Dr. Karen H. Miga for repeats and structural variations, as well as Dr. Matthew
Meyerson for complex genomes including not only two h...

## Key facts

- **NIH application ID:** 11179519
- **Project number:** 4R00HG012798-03
- **Recipient organization:** YALE UNIVERSITY
- **Principal Investigator:** Haoyu Cheng
- **Activity code:** R00 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $249,000
- **Award type:** 4N
- **Project period:** 2023-02-13 → 2027-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/11179519

## Citation

> US National Institutes of Health, RePORTER application 11179519, Robust and cost-effective computational methods for haplotype-resolved genome assemblies (4R00HG012798-03). Retrieved via AI Analytics 2026-06-12 from https://api.ai-analytics.org/grant/nih/11179519. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
