MetaboQuest: A Suite of Tools for Metabolite Annotation PROJECT SUMMARY Metabolomics aims at high throughput detection, quantification, and identification of metabolites in biological samples. The use of liquid chromatography coupled with mass spectrometry (LC-MS) has risen in prominence in the field of metabolomics due to its ability to analyze a sizable number of metabolites with a limited amount of biological material. However, in a typical untargeted metabolomics analysis of human samples by LC-MS, about 70% of the detected peaks represent unknown analytes mainly because existing mass spectral libraries cover only a small fraction of known compounds, but also due to uncertainty in peak picking, alignment of peaks, and recognizing isotopic peaks and adduct forms. These challenges have kept at bay the pace of development of data analytics pipelines for metabolomics and its integration with other omics studies. The goal of this Phase II SBIR proposal is to make metabolomics studies on a par with other omics studies such as genomics, transcriptomics, and proteomics, for which well-established pipelines are available. By doing so, we will accelerate the role of metabolomics in systems biology approaches for various applications including biomarker and drug discovery. To achieve this goal, we propose to develop a cloud-based platform that allows customers to build pipelines for analysis of LC-MS-based untargeted metabolomics data, starting from peak detection to metabolite annotation. This will be accomplished by implementing a suite of innovative tools that can be assembled into customized pipelines and by enhancing metabolite annotation accuracy through integration of information derived from multiple resources including compound databases, pathways, biochemical networks, and mass spectral libraries. Aim 1 of this proposal will focus on developing a suite of tools to enable: (1) peak detection, alignment, and quality assessment; (2) adduct and isotopic peak recognition; (3) mass-based search against multiple compound databases; (4) expert-based evaluation of putative IDs; (5) isotopic pattern analysis; (6) network-based evaluation of putative IDs; (7) spectral matching of MS/MS data against experimental and in- silico fragmentation patterns; (8) deep learning-based prediction of compound fingerprints; and (9) integrative assessment of putative metabolite IDs via a probabilistic model. Aim 2 will assemble the tools developed in Aim 1 into a cloud-based platform, MetaboQuest, which provides users with interactive visualization of peaks, isotopic patterns, networks, and mass spectra. Furthermore, Aim 2 will focus on integrating into MetaboQuest a pipeline builder that allows users to create pipelines by linking modules and run them remotely through a modular interactive web interface. Aim 3 will perform a comprehensive evaluation of MetaboQuest in terms of metabolite annotation accuracy, number of annotated metabolites, and computational efficiency ...