Integrated high throughput proteogenomic data analysis center for CPTAC

NIH RePORTER · NIH · U24 · $930,291 · view on reporter.nih.gov ↗

Abstract

Project Summary Mass-spectrometry (MS) based proteomics is increasingly being used in conjunction with genome profiling and next-generation sequencing (NGS) for large-scale characterization of cancer samples including cell lines, patient-derived xenografts and tumor material. Publications from the NCI-CPTAC program and others have highlighted the utility of proteogenomic analysis in elucidating cancer biology and identifying aberrant proteins and signaling networks in cancer. But, a high throughput pipeline implementing a range of analyses for transforming genomic and proteomic data into information easily accessible to scientists is still lacking. We propose an integrated high throughput proteogenomic data analysis center (PGDAC) to address this immediate need. The PGDAC will exploit Firehose—a platform developed by our group that has set the standard for genome and NGS data analysis—to implement a flexible, robust, automated and reproducible proteogenomic data analysis pipeline and visualization portal. This cloud-based near-real-time platform will not only include a robust version of the pipeline created for recently completed proteogenomic studies from our group, but will also incorporate new tools and algorithms, especially for the analysis and visualization of phosphoproteomic data. The result will be an automated, version-controlled pipeline that provides an integrated view of clinical, genomic (CNA, mRNA, RNA-seq, mutation) and proteomic (global proteome, phosphoproteome, and other PTM) data, with analyses ranging from correlations, clustering, marker identification and pathway enrichment. The FireBrowse graphical user inferface, combined with other visualization tools, will provide a familiar, accessible and intuitive interactive user interface for non-computational scientists. Analysis results and reports will be hosted on local web portal, in addition to being uploaded to the DCC. The proteogenomic data analysis pipeline will be used for biomarker selection and enable therapeutic target identification using disease-specific and pan-cancer cohorts, and quantify changes to cellular signaling networks due to site-specific post-translational modifications and genetic aberrations.

Key facts

NIH application ID: 10004584
Project number: 5U24CA210979-05
Recipient: BROAD INSTITUTE, INC.
Principal Investigator: Chet Birger
Activity code: U24
Funding institute: NIH
Fiscal year: 2020
Award amount: $930,291
Award type: 5
Project period: 2016-09-15 → 2022-08-31