# Gfastar: a C++ library and a tool suite to aid Telomere-to-Telomere genome assembly

> **NIH NIH R03** · ROCKEFELLER UNIVERSITY · 2024 · $169,500

## Abstract

Project Summary
The recent completion of a Telomere-to-Telomere (T2T) human genome has demonstrated that, in principle,
existing sequencing technologies allow gapless and nearly error-free, assembly of complex, human-sized
genomes. Despite these technological advancements, genome assemblies that are currently being generated
and released in public archives are still incomplete and contain a significant number of errors, which can
dramatically impact downstream analyses. Algorithms that can generate T2T genomes are still in their infancy.
The few that are available require extensive manual validation and curation and have so far worked on only a
handful of model species. Dedicated algorithms and software tools are essential for achieving T2T assembly
completeness and accuracy in all species. In particular, extensive evaluation and sophisticated manipulation of
genome assembly graphs are required for T2T genome assembly. To this end, an efficient tool suite is missing.
To bridge this gap, gfastar, a suite of algorithms and tools created for the evaluation and manipulation of
assembly graphs will be further advanced and continuously maintained. Gfastar is under active development,
and it is currently used by large-scale initiatives aimed at the generation of high-quality reference genomes such
as the Vertebrate Genomes Project. Gfastar is powered by a dedicated C++ library, gfalibs. Gfalibs will be
expanded to provide a comprehensive library dedicated to genome sequences and assembly graphs that can
support multiple file formats commonly used by the genome assembly community (e.g. FASTA, FASTQ, GFA1/2,
AGP, GAF, BAM, and FASTG), parallelized input/output (I/O) processing and many other general purpose
functions and utilities. This library will be extensively used by the whole gfastar software ecosystem (rdeval,
gfastats, gfalign, kcount, kreeq, teloscope, and gfase). Currently, several modules have already been
implemented in gfastar. These existing modules will be expanded with additional functionalities and new tools
will be developed. All these tools will synergistically contribute to the generation of T2T reference genomes at
scale. As a whole, the gfastar tool suite will provide unparallelled algorithms and functionalities for assembly
graph evaluation, manipulation and analysis, significantly supporting the genomic community by helping improve
the completeness and accuracy of genomes.

## Key facts

- **NIH application ID:** 10987122
- **Project number:** 1R03HG013362-01A1
- **Recipient organization:** ROCKEFELLER UNIVERSITY
- **Principal Investigator:** Giulio Formenti
- **Activity code:** R03 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $169,500
- **Award type:** 1
- **Project period:** 2024-09-25 → 2026-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10987122

## Citation

> US National Institutes of Health, RePORTER application 10987122, Gfastar: a C++ library and a tool suite to aid Telomere-to-Telomere genome assembly (1R03HG013362-01A1). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10987122. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
