A Federated Galaxy for user-friendly large-scale cancer genomics research

NIH RePORTER · NIH · U24 · $171,200 · view on reporter.nih.gov ↗

Abstract

Project Summary Cancer research is now a data-driven discipline, but only a minority of cancer researchers are data scientists. This severely restricts our ability to effectively study and cure the disease. The far reaching significance of our project is in federating disparate data and computational resources in order to provide a unifying analysis platform for computational cancer research. We will extend the popular scientific workbench Galaxy (https://galaxyproject.org) so that it can integrate with distributed data and compute resources used and needed by cancer researchers, including those resources in the NCI Cancer Research Data Commons (NCR DC). Our Federated Galaxy system will allow users to seamlessly access NCR DC data across multiple resources. It will support multiple analysis scenarios tuned to skills and computational requirements of individual researchers. The aims of this project are: Aim 1. Extend Galaxy for working with distributed cancer genomics and phenotypic data. This will enable Galaxy users to access both public and private cancer data regardless of their actual physical location. Best-practice approaches will be used for accessing restricted datasets. Aim 2. Enhance Galaxy for context-aware, distributed cancer genomics analyses using shared workflow representations. This will enable Galaxy users to run genomics analyses on different clouds, ultimately reducing the time, cost, and data transfer associated with analyses. Aim 3. Apply Federated Galaxy to precision oncology research. Workflows developed in this aim will leverage the technologies in Aims 1 and 2 to benchmark machine learning algorithms for predicting tumor phenotype and drug response. Interactive reports will summarize benchmarking results and utilize ITCR visualizations for deep dives into results. Our system will provide a singular access point to distributed cancer datasets and will enable these data to be analyzed within a single portal in a way that satisfies multiple analysis scenarios and utilizes diverse computational resources. Finally, a cloud-centric Galaxy built for the NCR DC will substantially grow the community of users working with the GDC and the NCR DC. This is because Galaxy brings with itself a vibrant world-wide community of users and developers, which numbers tens of thousands of scientists. These individuals will help to tune the GDC and other resources within the NCR DC to the needs of real-life analysis scenarios and will enrich the set of tools accessible to cancer researchers.

Key facts

NIH application ID
10908030
Project number
7U24CA231877-06
Recipient
H. LEE MOFFITT CANCER CTR & RES INST
Principal Investigator
Jeremy Goecks
Activity code
U24
Funding institute
NIH
Fiscal year
2022
Award amount
$171,200
Award type
7
Project period
2018-09-11 → 2024-08-31