# Elements: Real-Time, Incremental, and Sustainable Sequence Search over SRA

> **NSF 01002526DB NSF RESEARCH & RELATED ACTIVIT** · Northeastern University (MA) · $475,000

## Abstract

The Sequence Read Archive (SRA) is a vast but underutilized repository of genomic and related data, containing the majority of publicly available sequencing experiments in raw, unassembled format. Scientists could leverage this resource to search for newly discovered genes across the entire collection of existing public experiments, enabling rapid functional characterization and enhanced biological insights that would otherwise require extensive individual dataset analysis. However, building a sustainable and scalable index over the SRA presents significant challenges due to its massive size, continuous daily growth, and diverse data types. This project addresses these issues by developing new indexing tools that will enable scientists to search the entire SRA in real time.  To further enable broad usage, the project provides a hosted, web-accessible version of the calculated indices. 

This project develops a real-time, scalable sequence search index for SRA using innovative data structures, compression algorithms, and distributed indexing approaches designed for cost-effective deployment on commodity infrastructure. This project has three main thrusts. First, it extends the previously developed Mantis index to efficiently index abundance, positions and experiment metadata while maintaining the original performance and scalability. Second, it develops a dynamic and distributed version of Mantis to scale out and incrementally index newly deposited experiments and support re

## Key facts

- **NSF award ID:** 2513656
- **Awardee organization:** Northeastern University (MA)
- **SAM.gov UEI:** HLTMVS2JZBS6
- **PI:** Prashant Pandey
- **Primary program:** 01002526DB NSF RESEARCH & RELATED ACTIVIT
- **All programs:** Software Institutes, INTERDISCIPLINARY PROPOSALS, SMALL PROJECT, ADVANCES IN BIO INFORMATICS
- **Estimated total:** $475,000
- **Funds obligated:** $475,000
- **Transaction type:** Standard Grant
- **Period:** 09/01/2025 → 08/31/2028

## Primary source

NSF Award Search: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2513656

## Citation

> US National Science Foundation, Award 2513656, Elements: Real-Time, Incremental, and Sustainable Sequence Search over SRA. Retrieved via AI Analytics 2026-06-06 from https://api.ai-analytics.org/grant/nsf/2513656. Licensed CC0.

---

*[NSF Awards dataset](/datasets/nsf-awards) · CC0 1.0*
