Advanced computational methods in analyzing high-throughput sequencing data

NIH RePORTER · NIH · R01 · $344,300 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY High-performance computational algorithms are essential to the analysis of large-scale biological sequence data and have received broad attention. Developed several years or even more than a decade ago, many mainstream software packages for sequence alignment, assembly and genome annotation do not take full advantage of modern accurate long-read data or cannot keep up with the throughput of current technologies. The development of advanced algorithms is critical to the applications of sequencing technologies in the near future. Based on our work in the previous funding cycle, this project will address this pressing need with four proposals: (1) developing an alignment algorithm for accurate long reads and high-quality assemblies for more comprehensive alignment through highly repetitive regions and long segmental duplications; (2) extending our hifiasm assembler to the high-quality assembly of more accurate Oxford Nanopore reads available nowadays; (3) combining our hifiasm and dipasm algorithms for more accurate and more contiguous haplotype-resolved assembly without pedigree data; (4) developing a protein-to-genome aligner to assist large-scale gene annotation of new species. Upon completion, the proposed studies will result in high-performance user facing tools for sequence alignment and assembly that are faster and more accurate than the current generation.

Key facts

NIH application ID: 10367263
Project number: 2R01HG010040-06
Recipient: DANA-FARBER CANCER INST
Principal Investigator: Heng Li
Activity code: R01
Funding institute: NIH
Fiscal year: 2022
Award amount: $344,300
Award type: 2
Project period: 2018-05-01 → 2027-02-28