# Advanced computational methods in analyzing high-throughput sequencing data

> **NIH NIH R01** · DANA-FARBER CANCER INST · 2022 · $344,300

## Abstract

PROJECT SUMMARY
High-performance computational algorithms are essential to the analysis of large-scale biological sequence data
and have received broad attention. Developed several years or even more than a decade ago, many mainstream
software packages for sequence alignment, assembly and genome annotation do not take full advantage of
modern accurate long-read data or cannot keep up with the throughput of current technologies. The development
of advanced algorithms is critical to the applications of sequencing technologies in the near future. Based on our
work in the previous funding cycle, this project will address this pressing need with four proposals: (1) developing
an alignment algorithm for accurate long reads and high-quality assemblies for more comprehensive alignment
through highly repetitive regions and long segmental duplications; (2) extending our hifiasm assembler to the
high-quality assembly of more accurate Oxford Nanopore reads available nowadays; (3) combining our hifiasm
and dipasm algorithms for more accurate and more contiguous haplotype-resolved assembly without pedigree
data; (4) developing a protein-to-genome aligner to assist large-scale gene annotation of new species. Upon
completion, the proposed studies will result in high-performance user facing tools for sequence alignment and
assembly that are faster and more accurate than the current generation.

## Key facts

- **NIH application ID:** 10367263
- **Project number:** 2R01HG010040-06
- **Recipient organization:** DANA-FARBER CANCER INST
- **Principal Investigator:** Heng Li
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $344,300
- **Award type:** 2
- **Project period:** 2018-05-01 → 2027-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10367263

## Citation

> US National Institutes of Health, RePORTER application 10367263, Advanced computational methods in analyzing high-throughput sequencing data (2R01HG010040-06). Retrieved via AI Analytics 2026-05-21 from https://api.ai-analytics.org/grant/nih/10367263. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
