Project Abstract Gene Set Enrichment Analysis (GSEA) introduced in 2003, is now standard practice for analyzing genome- wide expression data. GSEA derives its power from identifying the activation/repression of sets of genes that share common biological function, chromosomal location or regulation and differentiate biological phenotypes or cellular states. This knowledge-based approach is effective in elucidating underlying biological mechanisms and generating hypotheses for further study and experimental validation. Since 2005, we have developed, distributed and supported a freely available GSEA software application along with a database of annotated gene sets – the Molecular Signatures Database (MSigDB). This popular resource has more than 113,000 registered users and over 10,200 citations in the literature, and the MSigDB has almost 18,000 annotated sets. The goal of this proposal is to continue to evolve and add value to the GSEA/MSigDB resource to best address the needs of the cancer research community, while maintaining the high level of professional quality and strong support that investigators have come to expect. We plan to increase the power and sensitivity of the GSEA method and enrich the MSigDB to further accelerate the pace of genomic research. Our specific aims are: Aim 1: Develop and deploy the next generation of the GSEA method and software to keep pace with the needs of the cancer research community. The new core algorithm will be based on information- theoretic approaches, guided by a collection of 100 relevant benchmarks and informed by an Advisory Board of established cancer researchers. To facilitate the use of GSEA by researchers at all levels of computational sophistication, we will distribute the GSEA analysis tools as both an open source code library and a suite of user friendly, reproducible, interactive, electronic notebooks. Aim 2: Extend the scope and specificity of the MSigDB, and evolve the underlying technology. In collaboration with the community, we will add valuable new collections to MSigDB including signatures of drug responses and genetic perturbations, sets for use with mouse models of cancer and PDXs, sets from pathway and network databases, and sets for use with proteomic data analysis. The MSigDB will be redesigned from its current XML file format and deployed as a lightweight, portable relational database that can better support its growing size, online exploration tools, and use by investigators and other software. Aim 3: Provide training and outreach activities for the cancer research community, and maintain and support the GSEA software and MSigDB. The success and popularity of the GSEA/MSigDB resource over the past decade;; our extensive experience in...