# Enhanced deconvolution and prediction of mutational signatures

> **NIH NIH R21** · BOSTON UNIVERSITY MEDICAL CAMPUS · 2020 · $179,438

## Abstract

The goals of this proposal are to develop novel statistical tools and a software package for performing mutational
signature deconvolution in cancer samples. Mutational signatures are patterns of co-occurring mutations that
can reveal insights into a cancer's etiology and evolution. Currently, non-negative matrix factorization (NMF) is
the “gold-standard” for mutational signature deconvolution. However, NMF has several deficiencies in that it
cannot do the following things: 1) easily characterize patterns within the flanking sequence beyond the
trinucleotide context 2) simultaneously characterize patterns of several genomic features, and 3) predict
mutational signatures of new samples given a previously trained model.
In this proposal, we will develop a novel discrete Bayesian hierarchical model to characterize mutational
signatures in tumor sequencing data that overcomes the limitations of NMF. These types of models are
commonly used in text mining applications to infer topics by examining co-occurring word counts across
documents. Our model will be able to characterize information about the flanking sequence far beyond the
trinucleotide context, incorporate information from other genomic features such as strand or region, and predict
signatures in single samples. Importantly, unlike NMF, the inclusion of extra genomic features in our
clustering algorithm will not result in loss of power for discovery and will aid in prediction of mutational
signatures targeted sequencing data by incorporating additional information.
We will also develop an R/Bioconductor package for data preprocessing, inference, and visualization, which will
streamline mutational signature analysis for researchers. Both NMF and our novel model will be available in the
package so users can compare and contrast the different computational approaches for mutational signature
inference. Interestingly, this package will have the capability to interface with several existing projects from the
Informatics Technology for Cancer Research (ITCR) program. Finally, we will generate reference mutational
signatures by analyzing a large-scale cancer exome sequencing dataset from The Cancer Genome Atlas
(TCGA) that can be used to predict mutational signatures in single samples in clinical workflows. Overall, our
model will be of great interest to the cancer community as it will provide greater insights into mutational signature
patterns and will be useful in clinical settings where mutational signature inference is performed in single
samples.

## Key facts

- **NIH application ID:** 9878085
- **Project number:** 5R21CA226188-02
- **Recipient organization:** BOSTON UNIVERSITY MEDICAL CAMPUS
- **Principal Investigator:** Joshua D Campbell
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $179,438
- **Award type:** 5
- **Project period:** 2019-03-01 → 2022-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9878085

## Citation

> US National Institutes of Health, RePORTER application 9878085, Enhanced deconvolution and prediction of mutational signatures (5R21CA226188-02). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/9878085. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
