Project Summary/Abstract Five years ago the AnVIL was founded with a vision of creating a federated data ecosystem. Its first phase focused on building the foundational capabilities needed to bring together data, tools, and research communities in a cloud-based environment. Now, in this second phase, the focus must be on scientific impact. We will pursue the following Aims that emphasize growing the AnVIL data corpus, going multi-cloud, creating analytical tools for flagship NHGRI initiatives, and increasing the user base. We will accomplish this through the following Aims: ● Aim1 (Data Ingestion): Support the ingestion, curation, and management of diverse datasets, so that they are accessible to the research community. In Phase I of the AnVIL, we ingested, wrangled, and QC’d more than 5PB of data from NHGRI consortia. In Phase II, we will continue this track record of success in supporting consortia, and extend our efforts to support the long tail of individual researchers with valuable data to contribute to the AnVIL. ● Aim2 (Software Infrastructure): Reducing barriers to entry by supporting multiple clouds and improving cost control. While Phase I of the AnVIL focused on establishing foundational software infrastructure, Phase II must be about scaling adoption of the AnVIL. We have a three-part strategy for achieving this: (i) Becoming multi-cloud, so that we support Microsoft Azure, in additional to Google Cloud; (ii) Creating “AnVIL lite,” a simplified and free tier of the AnVIL that lowers barriers to entry; (iii) Exposing tools to improve billing visibility and prevent overspend. ● Aim3 (Scientific Services): Leverage the AnVIL’s datasets and platforms to accelerate scientific research. In Phase II, we must prioritize the scientific impact of the AnVIL. Towards this end, we will leverage: (i) an imputation service drawing on AnVIL datasets and other datasets of diverse ancestry; (ii) a newly developed genomic variant store to support joint calling; (iii) an improved and expanded capability for third party deployment of tools and applications in the AnVIL. ● Aim4 (User Services): Support the growth and long-term success of the research community through user support, training, and project management. The services that comprise the AnVIL are not only web services, but also human services. Meeting the needs of researchers everywhere requires security, user support, training, and project governance. The guiding principle of our efforts is that progress in genomic data science will happen most rapidly if there is a diversity of interoperable solutions created by a plurality of groups. Toward that end, we will continue to ensure that the AnVIL continues to drive towards interoperability and federation by participating in NIH-led and international efforts focused on standard setting and data sharing.