# Integrated high throughput proteogenomic data analysis center for CPTAC

> **NIH NIH U24** · BROAD INSTITUTE, INC. · 2020 · $930,291

## Abstract

Project Summary
Mass-spectrometry (MS) based proteomics is increasingly being used in conjunction with
genome profiling and next-generation sequencing (NGS) for large-scale characterization of
cancer samples including cell lines, patient-derived xenografts and tumor material. Publications
from the NCI-CPTAC program and others have highlighted the utility of proteogenomic analysis
in elucidating cancer biology and identifying aberrant proteins and signaling networks in cancer.
But, a high throughput pipeline implementing a range of analyses for transforming genomic and
proteomic data into information easily accessible to scientists is still lacking.
We propose an integrated high throughput proteogenomic data analysis center (PGDAC)
to address this immediate need. The PGDAC will exploit Firehose—a platform developed by our
group that has set the standard for genome and NGS data analysis—to implement a flexible,
robust, automated and reproducible proteogenomic data analysis pipeline and visualization
portal. This cloud-based near-real-time platform will not only include a robust version of the
pipeline created for recently completed proteogenomic studies from our group, but will also
incorporate new tools and algorithms, especially for the analysis and visualization of
phosphoproteomic data. The result will be an automated, version-controlled pipeline that
provides an integrated view of clinical, genomic (CNA, mRNA, RNA-seq, mutation) and
proteomic (global proteome, phosphoproteome, and other PTM) data, with analyses ranging
from correlations, clustering, marker identification and pathway enrichment. The FireBrowse
graphical user inferface, combined with other visualization tools, will provide a familiar,
accessible and intuitive interactive user interface for non-computational scientists. Analysis
results and reports will be hosted on local web portal, in addition to being uploaded to the DCC.
The proteogenomic data analysis pipeline will be used for biomarker selection and enable
therapeutic target identification using disease-specific and pan-cancer cohorts, and quantify
changes to cellular signaling networks due to site-specific post-translational modifications and
genetic aberrations.

## Key facts

- **NIH application ID:** 10004584
- **Project number:** 5U24CA210979-05
- **Recipient organization:** BROAD INSTITUTE, INC.
- **Principal Investigator:** Chet Birger
- **Activity code:** U24 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $930,291
- **Award type:** 5
- **Project period:** 2016-09-15 → 2022-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10004584

## Citation

> US National Institutes of Health, RePORTER application 10004584, Integrated high throughput proteogenomic data analysis center for CPTAC (5U24CA210979-05). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10004584. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*