A statistical framework to systematically characterize cancer driver mutations in noncoding genomic regions

NIH RePORTER · NIH · R21 · $193,576 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY Cancer genomes typically harbor a substantial number of somatic mutations. Relatively few driver mutations actually alter the function of proteins in tumor cells, whereas most mutations are considered to be functionally neutral passenger mutations. Over the past decade, the search for cancer driver mutations has focused on coding regions and several mutational significance algorithms have been developed for coding mutations. The contribution of mutations in noncoding regulatory regions to tumor formation largely remains unknown and current mutational significance algorithms are not designed to detect driver mutations in noncoding regions, due to biological differences between coding and noncoding mutations. The emerging availability of large whole- genome sequencing datasets (e.g. PCAWG and HMF datasets) creates an ample opportunity to develop new mutational significance algorithms that are particularly designed for the interpretation of noncoding regions. Recently, we have developed a new statistical approach that identifies driver mutations in coding regions based on the nucleotide context. Critically, consideration of the nucleotide context around mutations does not require prior knowledge for functional consequences associated with these mutations. Hence, we hypothesize that generalizing our nucleotide context model to noncoding regions will uncover novel noncoding driver mutations that cannot be detected using the mutational significance approaches currently available. For this purpose, we will develop a statistical framework that incorporates the biological differences between coding and noncoding mutations and that is specifically designed to detect driver mutations in noncoding regions. Specifically, we will consider the context-dependent distribution of passenger mutations, modeling of the background mutation rate, accurately partition the background mutation rate, model the sequence composition of the reference genome, and account for coverage fluctuation. We will then combine these statistical components by computing an independent product of their underlying probabilities. We will derive a significance p-value using a Monte-Carlo simulation approach, and use FDR for multiple hypothesis test correction. This strategy will allow us to accurately estimate the significance of somatic mutations in noncoding genomic regions. We will next apply this statistical framework to whole-genome sequencing data of 5,523 tumor patients, thereby deriving a comprehensive list of candidate driver mutations in noncoding regions. Finally, we will investigate whether noncoding mutations are overrepresented in transcription factor binding sites, regulate gene expression levels, induce alternative splicing, or affect epigenomic states. Upon the completion of this project, we will have developed and applied a statistical framework for discovery of significant somatic mutations in noncoding regions, and defined the mutational landscape of the no...

Key facts

NIH application ID: 9957082
Project number: 5R21CA242861-02
Recipient: DANA-FARBER CANCER INST
Principal Investigator: Eliezer M Van Allen
Activity code: R21
Funding institute: NIH
Fiscal year: 2020
Award amount: $193,576
Award type: 5
Project period: 2019-07-01 → 2021-12-31