PROJECT SUMMARY/ABSTRACT Human papillomavirus positive (HPV+) oral cancer (OC), accounting for over 70% of oropharyngeal cancer cases in North America and Europe, was found to be more aggressive with a higher tendency of metastasis compared to HPV negative OC. It is believed that such aggressiveness is associated to the nature of its oncogenic mechanisms triggered by HPV infection. HPV encodes two potent oncogenes E6 and E7 that inactivate key tumor suppressors pRb and p53 and subsequently alter the expression spectrum of genes in oral epithelial cells. To identify the molecular mechanisms of HPV oncogenesis, numerous studies have compared the (epi)genomic profiles of HPV+ OC to normal oral epithelium, HPV negative OC, or other cancer types. These studies have generated high-throughput sequencing datasets using different methods (transcriptomic, genomic and epigenomic) and cellular conditions (normal, viral-infected and cancerous). However, these datasets were not fully explored due to lack of comparable analysis platform to efficiently interrogate them, especially when heterogeneity and batch effects are high across studies. We propose to leverage our data science experience as well as close wet-lab collaborations to perform integrative analysis to identify HPV-specific biomarkers in HPV+ OC. We propose to integrate (epi)genomic next generation sequencing datasets from 11 selected studies (with addition if more availability in the future), involving 3 data types, 13 cell lines, 3 viral infection stages and 2 anatomically similar sites. At the core of the analysis is to remove potential batch effects and biases of the integrated datasets and to set proper controls to nominate oncogenic biomarkers. To validate the findings, we propose both dry and wet-lab experiments to evaluate candidate biomarkers. Insights from the proposed study could advance our understanding of oral biology and potentially translate to novel therapeutics for HPV+ OC.