# Novel algorithm development, user support and maintenance for STAR

> **NIH NIH R01** · COLD SPRING HARBOR LABORATORY · 2021 · $480,000

## Abstract

Abstract.
Sequencing of transcribed RNA molecules (RNA-seq) is an invaluable tool for studying cell transcriptomes at
high resolution and depth. STAR is a popular RNA-seq analysis suite that combines high accuracy and ultra-
fast speed of mapping with a reach collection of built-in features and tools. STAR is used by hundreds of
researchers, including several major consortia and institutions. We propose to significantly enhance and
expand STAR capabilities in the following important areas.
 1. Develop novel algorithms and tools integrated directly into STAR.
RNA-seq analyses require combining multiple tools into “processing pipelines” which is demanding task owing
to bottlenecks and compatibility issues. We aim to overcome these impediments by integrating novel tools
directly into STAR software: (i) mapping of RNA-seq reads to personal genomes utilizing genotype information
to produce more accurate allele aware alignments, thus increasing precision of personal genomics analyses; (ii)
mapping of long RNA reads from emerging sequencing technologies such as PacBio and Oxford Nanopore.
 2. Increase accuracy and speed and of the core mapping algorithm.
New applications, such as personal genomics, require significant improvements in mapping accuracy. We will
enhance STAR mapping algorithm with (i) spliced seed extension through mismatches/indels; and (ii) limited
local alignment so of the read ends. Tremendous increase of sequencing throughput has put a significant
emphasis on the efficiency of the computational algorithms. To keep up with the increasing sequencing
throughput, we will boost STAR algorithm with (i) vectorization of query-text comparisons using SIMD/SSE
instructions; (ii) dynamical programming for seed stitching. The improvements in accuracy and speed will be
validated in both simulated and real RNA-seq data. Mapping accuracy depends strongly on choosing the best
mapping parameters for a particular dataset. We will devise automated parameter optimization procedures to
eliminate guesswork in parameter selection.
 3. Enhance user-friendliness, user support/education, and software maintenance.
User-friendliness is crucial for bioinformatics software usefulness to the broadest audience. We aim to
significantly enhance users' experience by developing STAR web user interfaces for both pre-run data input,
and post-run exploring of results. To enable STAR analysis in the cloud, we will create STAR virtual machines
on popular Amazon and Google cloud computing services, and develop Hadoop-based tools for distribute
processing of the big datasets. We will also expand user support and education, continue to implement user-
requested features and debug user-reported issues.

## Key facts

- **NIH application ID:** 10167758
- **Project number:** 5R01HG009318-05
- **Recipient organization:** COLD SPRING HARBOR LABORATORY
- **Principal Investigator:** ALEXANDER DOBIN
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $480,000
- **Award type:** 5
- **Project period:** 2017-08-18 → 2023-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10167758

## Citation

> US National Institutes of Health, RePORTER application 10167758, Novel algorithm development, user support and maintenance for STAR (5R01HG009318-05). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10167758. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*