Elements: Real-Time, Incremental, and Sustainable Sequence Search over SRA

NSF Award Search · 01002526DB NSF RESEARCH & RELATED ACTIVIT · $475,000 · view on nsf.gov ↗

Abstract

The Sequence Read Archive (SRA) is a vast but underutilized repository of genomic and related data, containing the majority of publicly available sequencing experiments in raw, unassembled format. Scientists could leverage this resource to search for newly discovered genes across the entire collection of existing public experiments, enabling rapid functional characterization and enhanced biological insights that would otherwise require extensive individual dataset analysis. However, building a sustainable and scalable index over the SRA presents significant challenges due to its massive size, continuous daily growth, and diverse data types. This project addresses these issues by developing new indexing tools that will enable scientists to search the entire SRA in real time. To further enable broad usage, the project provides a hosted, web-accessible version of the calculated indices. This project develops a real-time, scalable sequence search index for SRA using innovative data structures, compression algorithms, and distributed indexing approaches designed for cost-effective deployment on commodity infrastructure. This project has three main thrusts. First, it extends the previously developed Mantis index to efficiently index abundance, positions and experiment metadata while maintaining the original performance and scalability. Second, it develops a dynamic and distributed version of Mantis to scale out and incrementally index newly deposited experiments and support re

Key facts

NSF award ID
2513656
Awardee
Northeastern University (MA)
SAM.gov UEI
HLTMVS2JZBS6
PI
Prashant Pandey
Primary program
01002526DB NSF RESEARCH & RELATED ACTIVIT
All programs
Software Institutes, INTERDISCIPLINARY PROPOSALS, SMALL PROJECT, ADVANCES IN BIO INFORMATICS
Estimated total
$475,000
Funds obligated
$475,000
Transaction type
Standard Grant
Period
09/01/2025 → 08/31/2028