Project Summary The proposed work will accelerate the pace of drug discovery by developing, validating, and testing new methods, tools, and resources for structure-based drug design. Two fundamental challenges of structure-based drug design are the accurate scoring and ranking of protein-ligand structures, which identifies active com- pounds, and the ability to efficiently search a large number of ligands, which ensures that active compounds are sampled. This proposal will address these challenges by developing a novel approach for protein-ligand scoring and expanding the size of the chemical space that can be efficiently searched during lead optimiza- tion. The methods will be validated by their prospective application toward the discovery of new anti-cancer molecules and will be made readily accessible through online resources and open-source tools. The proposal leverages recent and significant advances in deep learning and image recognition to develop scoring functions that accurately recognize high-affinity protein-ligand interactions. This is achieved by design- ing and training convolutional neural nets on three-dimensional representations of protein-ligand structures to discriminate between binders and non-binders. Convolutional neural net training will exploit large datasets of affinity and structural data to automatically extract the relevant features necessary to accurately prioritize compounds. Additionally, the proposal develops the first means of fully integrating a convolutional neural net scoring function directly into an energy minimization and docking workflow. Interactive virtual screening enables the search of millions of compounds in a few seconds so that queries can be interactively optimized. Interactivity enables the synergistic unification of human expert knowledge and efficient computational algorithms. The proposed work will dramatically expand the size of chemical space ac- cessible through interactive virtual screening. Algorithms for efficiently searching the chemical space of billions or trillions of compounds implicitly defined by a set of reaction schemas and fragments will be created as part of a lead optimization workflow. Fragment-oriented search will be accelerated by a new data structure that combines pharmacophore and molecular shape information into a single sub-linear time index. The scoring and lead optimization methods developed in this proposal will be released as open-source soft- ware and made immediately available through open-access online resources. As part of the prospective valida- tion of the proposed methods, these resources will be used to identify hit compounds and optimize leads for two targets related to cancer metabolism: serine hydroxymethyltransferase and kidney glutaminase isoform C. Successful completion of the objectives of this proposal will positively impact public health by reducing the cost and time-to-market of developing new drugs, particularly with respect to novel protein targets.