Advanced computational methods in analyzing high-throughput sequencing data

NIH RePORTER · NIH · R01 · $344,300 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY High-performance computational algorithms are essential to the analysis of large-scale biological sequence data and have received broad attention. Developed several years or even more than a decade ago, many mainstream software packages for sequence alignment, assembly and genome annotation do not take full advantage of modern accurate long-read data or cannot keep up with the throughput of current technologies. The development of advanced algorithms is critical to the applications of sequencing technologies in the near future. Based on our work in the previous funding cycle, this project will address this pressing need with four proposals: (1) developing an alignment algorithm for accurate long reads and high-quality assemblies for more comprehensive alignment through highly repetitive regions and long segmental duplications; (2) extending our hifiasm assembler to the high-quality assembly of more accurate Oxford Nanopore reads available nowadays; (3) combining our hifiasm and dipasm algorithms for more accurate and more contiguous haplotype-resolved assembly without pedigree data; (4) developing a protein-to-genome aligner to assist large-scale gene annotation of new species. Upon completion, the proposed studies will result in high-performance user facing tools for sequence alignment and assembly that are faster and more accurate than the current generation.

Key facts

NIH application ID
10367263
Project number
2R01HG010040-06
Recipient
DANA-FARBER CANCER INST
Principal Investigator
Heng Li
Activity code
R01
Funding institute
NIH
Fiscal year
2022
Award amount
$344,300
Award type
2
Project period
2018-05-01 → 2027-02-28