# Implementing the Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL)

> **NIH NIH U24** · JOHNS HOPKINS UNIVERSITY · 2022 · $2,000,000

## Abstract

Project Summary: The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab‐space
(AnVIL) will power the next generation of computational genomic research. We will develop the AnVIL
environment using the leading national‐scale cyberinfrastructure as the foundation supporting the most
widely‐used analysis environments and frameworks vetted by NHGRI researchers. Our user‐centered solution
for data access, analysis, and visualization will enable investigators across all levels of expertise to fully utilize
genomic datasets using environments they are already familiar with, leveraging well‐engineered and
optimized scientific computing infrastructure for greater efficiency and lower costs.
Aim 1: Engineer the AnVIL Data and Compute Platform. We will leverage the TACC Science Cloud and the
Agave Science‐As‐A‐Service platform to deploy a cloud‐based environment supporting the data storage,
access, and compute needs of the NHGRI research community.
Aim 2. Develop APIs for Data and Compute Access. To maximize the domain‐wide impact of AnVIL, we will
draw on community efforts and our own collective experience supporting diverse genomic analyses to define
access standards and to design and implement AnVIL APIs.
Aim 3. Build an AnVIL metaportal integrating widely used analysis platforms. We will create a single
metaportal residing within TACC's Science Cloud providing a unified view of users' data and activities,
provenance and billing, and access to several of the most widely used workbenches for genomic research.
These workbenches include Bioconductor, Galaxy, the Genome Modeling System, Juypter, and RStudio. The
metaportal will also provide access to the most popular genomic visualization tools.
Aim 4. Develop novel data aggregation, indexing and query schemes to increase analysis efficiency and
reduce cost. We will build approaches, including indexing and pre‐computation of key statistics, to make
better use of existing (e.g., TCGA, GTEx) and future large datasets with the goal of increasing data utility and
decreasing the cost of posing scientific queries against massive datasets.
Aim 5: Develop training and outreach infrastructure and materials. We will build support for training
directly in the AnVIL platform, including tight coupling to MOOC style courses, self‐directed training
materials, and support materials for conducting online and in‐person training workshops.
Aim 6: Engage in effective project governance and assessment. We will establish a leadership and
management structure involving key stakeholders from NHGRI, including program staff and the NHGRI
appointed Data Steering Committee and External Advisory Committee.
The key innovation of this work is our leveraging of existing hardware, software, and human resources to
create a practical and pragmatic solution to the challenge of building the AnVIL.

## Key facts

- **NIH application ID:** 10450774
- **Project number:** 5U24HG010263-05
- **Recipient organization:** JOHNS HOPKINS UNIVERSITY
- **Principal Investigator:** VINCENT JAMES CAREY
- **Activity code:** U24 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $2,000,000
- **Award type:** 5
- **Project period:** 2018-09-21 → 2023-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10450774

## Citation

> US National Institutes of Health, RePORTER application 10450774, Implementing the Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) (5U24HG010263-05). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10450774. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
