Project Summary Many steps in drug discovery, as well as in basic biological research, could be accelerated with methods that measure multiple phenotypes simultaneously, known as profiling technologies. When a cell or organism’s function is affected by disease, a chemical compound, or a genetic perturbation, simultaneously measuring many phenotypes provides greater power and less bias in identifying their impact. But many existing assays require many painstaking months to develop and measure only one specific phenotype, missing potentially crucial information - for example, a chemical might have positive effects on a disease-related assay but its effects on other pathways are unmeasured, leading to undetected toxicity that is only discovered later. Research in this MIRA period will focus on advancing algorithms and applications for image-based profiling, a surprisingly quantitative type of profiling that is the least expensive and among the highest in information content. Image-based profiling captures the location and amount of each stained cellular component, as well as changes in morphology, but its applications are underexplored and its algorithms underdeveloped. Having invented the main assay and software in the field, we aim to bring the technique to maturity now that four things have become available or possible: (a) Image Data - huge quantities of suitable systematic, structured, high-throughput, single-cell image data, usually from the Cell Painting assay, via several public-private partnerships and totaling more than 3 billion single cells across more than 100,000 genetic and chemical perturbations; (b) Algorithms - novel deep learning algorithms for several steps in profiling: segmentation, feature extraction, and learning predictive models; (c) Integration - other data sources now available at a scale that can be fruitfully combined with images; (d) Applications - out of more than a dozen theoretical applications, many have not been attempted or scaled up for basic biological research and drug discovery, such as determining compounds’ mechanism of action, identifying their targets, discovering relationships with genes, predicting toxicity or other assay activity, and identifying gene function. To fulfill the promise and real-world efficacy of image-based profiling, we therefore aim to leverage recently available data and algorithms to carry out diverse biological applications, including identifying gene- and compound-associated phenotypes and functions, virtual screening to identify potential compounds that target genes of interest, hypothesizing the mechanism of action/targets for small molecules, computationally predicting assay activity and toxicity, and identifying screenable disease-associated phenotypes. In doing so, we aim to make rapid progress in algorithms, including trained neural networks/deep learning models, multi- modal integration, visualization/interpretation, batch correction, and single-cell methods.