# Support for the use and evaluation of large cloud-based genomic datasets.

> **NIH NIH U24** · STANFORD UNIVERSITY · 2023 · $217,082

## Abstract

Project Summary/Abstract
The goal of this project is to develop cloud-focused training and benchmarks for the use of cloud
resources provided from selected NIH funded large data consortium. There is a need for training via
interactive jamborees in the use of existing software for large-scale data analysis via the cloud. We will
also evaluate the usefulness of the cloud by comparing the NIH AnVIL service as a platform versus the
native Google Cloud Platform for developing and comparing pipelines, interactive analyses such as
setup and use of Jupyter notebooks, and for running benchmarks via automated leaderboards. These
leaderboards will be compared with what is provided by Synapse as an alternative. We have experience
in designing, implementing, and hosting jamborees or workshops for the education of trainees and staff
researchers in the tools and methods available to utilize large cloud-based data, and to integrate these
data into computational analysis pipelines. The cloud is becoming a critical component for computational
biologists and the greater biomedical research community in exploring the human genome, its
regulation, association with disease, and structure. However, because of the complexity of using cloud-
based resource this has not trickled down to students and resources without advanced computing skills.
In our 10 years of experience implementing and managing analysis pipelines on the cloud we have
shown the superior advantage for use of cloud-based large-scale analyses. The goal of this project is to
provide interactive training for the use of cloud-based analysis software, easy to use data sharing, and
evaluate two platforms in their ability to assist in creating effective tools. We have cross-cutting and
unparalleled technical expertise in data management, genomics, informatics, network analysis, and
privacy-preserving applications as well as our roles leading large data coordination centers, managing,
and coordinating data and metadata, as well as creating gold standard knowledgebases.

## Key facts

- **NIH application ID:** 10827800
- **Project number:** 3U24HG012012-03S2
- **Recipient organization:** STANFORD UNIVERSITY
- **Principal Investigator:** J. Michael Cherry
- **Activity code:** U24 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $217,082
- **Award type:** 3
- **Project period:** 2023-09-07 → 2026-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10827800

## Citation

> US National Institutes of Health, RePORTER application 10827800, Support for the use and evaluation of large cloud-based genomic datasets. (3U24HG012012-03S2). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10827800. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
