# Turning big data analysis infrastructure for HIV research

> **NIH NIH R01** · PENNSYLVANIA STATE UNIVERSITY, THE · 2020 · $374,737

## Abstract

Summary
The COVID-19/SARS-CoV-2 pandemic is a once in a generation, “all-hands-on-deck” event for the
scientific community. This pandemic is also the first in which real time genomic data are available,
e.g. via GISAID [1], where genomic sequences are deposited daily. Vital insights about the virus and
the epidemic depend on rapid and reliable genomic analysis of diverse viral sample sequences by
multiple laboratories. Yet we repeatedly encounter the same avoidable shortcomings early in viral
investigations, including COVID-19: lack of reproducibility, rigor, and data/analytic sharing. Only
about 10% of the published genomes have quality metrics, primary data (read files), or any level of
details on analytics, making these data irreproducible and unverifiable; over 40% of GISAID
submissions to date provide no information about how the sequences were generated. Essential
questions about the extent of intra-host genomic variability (indicative of adaptation or multiple
infection), viral evolution (selection, recombination), transmission (phylogenetic and
phylogeographic) cannot be answered reliably if researchers cannot trust/replicate the source data
and analytical approaches. One of the key goals/deliverables of this supplement will be the open
analytic workflows that can be used to curate and standardize genomic data, and high quality
annotated variation data.

## Key facts

- **NIH application ID:** 10148893
- **Project number:** 3R01AI134384-04S1
- **Recipient organization:** PENNSYLVANIA STATE UNIVERSITY, THE
- **Principal Investigator:** ANTON NEKRUTENKO
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $374,737
- **Award type:** 3
- **Project period:** 2020-07-09 → 2022-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10148893

## Citation

> US National Institutes of Health, RePORTER application 10148893, Turning big data analysis infrastructure for HIV research (3R01AI134384-04S1). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10148893. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
