Support for the use and evaluation of large cloud-based genomic datasets.

NIH RePORTER · NIH · U24 · $217,082 · view on reporter.nih.gov ↗

Abstract

Project Summary/Abstract The goal of this project is to develop cloud-focused training and benchmarks for the use of cloud resources provided from selected NIH funded large data consortium. There is a need for training via interactive jamborees in the use of existing software for large-scale data analysis via the cloud. We will also evaluate the usefulness of the cloud by comparing the NIH AnVIL service as a platform versus the native Google Cloud Platform for developing and comparing pipelines, interactive analyses such as setup and use of Jupyter notebooks, and for running benchmarks via automated leaderboards. These leaderboards will be compared with what is provided by Synapse as an alternative. We have experience in designing, implementing, and hosting jamborees or workshops for the education of trainees and staff researchers in the tools and methods available to utilize large cloud-based data, and to integrate these data into computational analysis pipelines. The cloud is becoming a critical component for computational biologists and the greater biomedical research community in exploring the human genome, its regulation, association with disease, and structure. However, because of the complexity of using cloud- based resource this has not trickled down to students and resources without advanced computing skills. In our 10 years of experience implementing and managing analysis pipelines on the cloud we have shown the superior advantage for use of cloud-based large-scale analyses. The goal of this project is to provide interactive training for the use of cloud-based analysis software, easy to use data sharing, and evaluate two platforms in their ability to assist in creating effective tools. We have cross-cutting and unparalleled technical expertise in data management, genomics, informatics, network analysis, and privacy-preserving applications as well as our roles leading large data coordination centers, managing, and coordinating data and metadata, as well as creating gold standard knowledgebases.

Key facts

NIH application ID
10827800
Project number
3U24HG012012-03S2
Recipient
STANFORD UNIVERSITY
Principal Investigator
J. Michael Cherry
Activity code
U24
Funding institute
NIH
Fiscal year
2023
Award amount
$217,082
Award type
3
Project period
2023-09-07 → 2026-05-31