Cancer Genomics: Integrative and Scalable Solutions in R/Bioconductor

NIH RePORTER · NIH · U24 · $846,735 · view on reporter.nih.gov ↗

Abstract

Abstract Bioconductor is a crucial resource for statistical analysis and data management in cancer genomics research, providing more than 2,200 open-source software and data packages. This software ecosystem is supported by core data classes and methods that provide convenient representations and efficient operations for many kinds of high-throughput omics data. Recent technical advances enable increasingly resolved study of the molecular biology of cancer at the single-cell level, through combined assaying of DNA sequence, epigenetics, gene expression, protein, and other aspects, even with spatial information. These developments present new challenges in the complexity, size, and interpretability of data analysis. The overarching goal of this project is to maintain and expand the core Bioconductor software infrastructure to meet these challenges, through the following aims. First, we will maintain and expand infrastructure for multimodal experiments and spatial transcriptomics, and connect the R/Bioconductor ecosystem with non-R image analysis tools to facilitate statistical analysis of histopathology images in the context of spatial transcriptomics and other molecular data. Second, we will transition Bioconductor’s data and annotation-sharing tools to a federated, language-agnostic system that facilitates contribution and extension by the community, improves findability, enables automated improvement in metadata, and enhances utility for non-R users. Third, we will create curated and integrated data repositories that make key datasets more findable and usable, and drive the development of the planned new data-sharing systems. Finally, we will develop a program of user training and new outreach approaches to support the training of users and developers, including the creation of a large language model-based chatbot and a cloud-based platform with persistent disks for courses and workshops. By fostering knowledge dissemination and practical training, we aim to empower researchers with the necessary skills and resources to leverage the enhanced capabilities of Bioconductor for advanced cancer genomics research.

Key facts

NIH application ID
10865962
Project number
1U24CA289073-01
Recipient
GRADUATE SCHOOL OF PUBLIC HEALTH AND HEALTH POLICY
Principal Investigator
Sean Davis
Activity code
U24
Funding institute
NIH
Fiscal year
2024
Award amount
$846,735
Award type
1
Project period
2024-07-01 → 2029-06-30