Unified Computation Tools for Natural Products Research

NIH RePORTER · NIH · R01 · $547,104 · view on reporter.nih.gov ↗

Abstract

Summary The overarching goal for this proposed renewal application will be to further advance tools that are in development and to effectively integrate several types of analytical data with biological assay data and genomic information. This will create a powerful set of tools for faster and even more accurate identification of new molecules, dereplication of known ones, and to directly infer biological activities from spectroscopic information. In the current period of support, we have made substantial progress in developing highly useful tools for automatic annotations and identifications of organic molecules, specifically focused on natural products. The Global Natural Products Social (GNPS) Molecular Networking analysis and knowledge dissemination ecosystem has processed almost 160,000 jobs in nearly 160 countries worldwide, has 4-6,000 new job submissions per month and is accessed over 200,000 times a month (majority accessions are for reference library access, inspection of public data and previous jobs that the community shares as hyperlinks in papers), and has become a mainstream tool for the annotation of organic molecules deriving from diverse sources, especially in metabolomics workflows. The public website for Small Molecule Accurate Recognition Technology (SMART), a deep learning model for providing candidate structures based on 1H-13C HSQC NMR data, went live in December 2019 and already has over 3000 jobs in 50 countries. All tools developed in this proposal will become part of this analysis ecosystem. The four laboratories contributing to this proposed research activity have created an open and integrated team that is continuing to creatively innovate new informatic tools to enhance small molecule structure annotations and inference of their chemical and biological properties. We have four specific aims: 1) To complete the development and evaluation of a set of new and innovative tools for natural products analysis, and deploy these as freely available resources for the worldwide community. 2) To refine the structural characterization of molecules through leveraging repository scale mass spectral information along with NMR data and genomic inputs. 3) To create a new SMART-based tool that integrates mass spectrometry and HSQC NMR data as the input for a new deep learning system with the goal of achieving more accurate predictions of structure. 4) To use deep learning to enhance SMART with bioactivity data so as to enable SMART to predict activities of molecules based on spectroscopic features. The data will also augment the GNPS database with biological assay binding data. An additional consequence of these goals will be the further digitization of natural products analytical data so that they can be used in the computational tools planned herein, as well as other tools in the future. Completion of these four specific aims will create new integrated tools for the precise identification of new natural product structures, and enable...

Key facts

NIH application ID
10211176
Project number
2R01GM107550-09
Recipient
UNIVERSITY OF CALIFORNIA, SAN DIEGO
Principal Investigator
GARRISON W COTTRELL
Activity code
R01
Funding institute
NIH
Fiscal year
2021
Award amount
$547,104
Award type
2
Project period
2013-09-05 → 2025-04-30