# Expanding the AnVIL Data Ecosystem

> **NIH NIH U24** · BROAD INSTITUTE, INC. · 2024 · $3,430,003

## Abstract

Project Summary/Abstract
Five years ago the AnVIL was founded with a vision of creating a federated data ecosystem. Its first phase
focused on building the foundational capabilities needed to bring together data, tools, and research
communities in a cloud-based environment. Now, in this second phase, the focus must be on scientific impact.
We will pursue the following Aims that emphasize growing the AnVIL data corpus, going multi-cloud, creating
analytical tools for flagship NHGRI initiatives, and increasing the user base. We will accomplish this through
the following Aims:
 ● Aim1 (Data Ingestion): Support the ingestion, curation, and management of diverse datasets, so
 that they are accessible to the research community. In Phase I of the AnVIL, we ingested,
 wrangled, and QC’d more than 5PB of data from NHGRI consortia. In Phase II, we will continue this
 track record of success in supporting consortia, and extend our efforts to support the long tail of
 individual researchers with valuable data to contribute to the AnVIL.
 ● Aim2 (Software Infrastructure): Reducing barriers to entry by supporting multiple clouds and
 improving cost control. While Phase I of the AnVIL focused on establishing foundational software
 infrastructure, Phase II must be about scaling adoption of the AnVIL. We have a three-part strategy for
 achieving this: (i) Becoming multi-cloud, so that we support Microsoft Azure, in additional to Google
 Cloud; (ii) Creating “AnVIL lite,” a simplified and free tier of the AnVIL that lowers barriers to entry; (iii)
 Exposing tools to improve billing visibility and prevent overspend.
 ● Aim3 (Scientific Services): Leverage the AnVIL’s datasets and platforms to accelerate scientific
 research. In Phase II, we must prioritize the scientific impact of the AnVIL. Towards this end, we will
 leverage: (i) an imputation service drawing on AnVIL datasets and other datasets of diverse ancestry;
 (ii) a newly developed genomic variant store to support joint calling; (iii) an improved and expanded
 capability for third party deployment of tools and applications in the AnVIL.
 ● Aim4 (User Services): Support the growth and long-term success of the research community
 through user support, training, and project management. The services that comprise the AnVIL are
 not only web services, but also human services. Meeting the needs of researchers everywhere requires
 security, user support, training, and project governance.
The guiding principle of our efforts is that progress in genomic data science will happen most rapidly if there is
a diversity of interoperable solutions created by a plurality of groups. Toward that end, we will continue to
ensure that the AnVIL continues to drive towards interoperability and federation by participating in NIH-led and
international efforts focused on standard setting and data sharing.

## Key facts

- **NIH application ID:** 10918350
- **Project number:** 5U24HG010262-07
- **Recipient organization:** BROAD INSTITUTE, INC.
- **Principal Investigator:** Robert J Carroll
- **Activity code:** U24 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $3,430,003
- **Award type:** 5
- **Project period:** 2018-09-19 → 2028-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10918350

## Citation

> US National Institutes of Health, RePORTER application 10918350, Expanding the AnVIL Data Ecosystem (5U24HG010262-07). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10918350. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
