# Unified Computation Tools for Natural Products Research

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA, SAN DIEGO · 2021 · $547,104

## Abstract

Summary
The overarching goal for this proposed renewal application will be to further advance tools that are in development and to
effectively integrate several types of analytical data with biological assay data and genomic information. This will create a
powerful set of tools for faster and even more accurate identification of new molecules, dereplication of known ones, and
to directly infer biological activities from spectroscopic information. In the current period of support, we have made
substantial progress in developing highly useful tools for automatic annotations and identifications of organic molecules,
specifically focused on natural products. The Global Natural Products Social (GNPS) Molecular Networking analysis and
knowledge dissemination ecosystem has processed almost 160,000 jobs in nearly 160 countries worldwide, has 4-6,000
new job submissions per month and is accessed over 200,000 times a month (majority accessions are for reference library
access, inspection of public data and previous jobs that the community shares as hyperlinks in papers), and has become a
mainstream tool for the annotation of organic molecules deriving from diverse sources, especially in metabolomics
workflows. The public website for Small Molecule Accurate Recognition Technology (SMART), a deep learning model
for providing candidate structures based on 1H-13C HSQC NMR data, went live in December 2019 and already has over
3000 jobs in 50 countries. All tools developed in this proposal will become part of this analysis ecosystem. The four
laboratories contributing to this proposed research activity have created an open and integrated team that is continuing to
creatively innovate new informatic tools to enhance small molecule structure annotations and inference of their chemical
and biological properties. We have four specific aims: 1) To complete the development and evaluation of a set of new
and innovative tools for natural products analysis, and deploy these as freely available resources for the worldwide
community. 2) To refine the structural characterization of molecules through leveraging repository scale mass
spectral information along with NMR data and genomic inputs. 3) To create a new SMART-based tool that
integrates mass spectrometry and HSQC NMR data as the input for a new deep learning system with the goal of
achieving more accurate predictions of structure. 4) To use deep learning to enhance SMART with bioactivity data
so as to enable SMART to predict activities of molecules based on spectroscopic features. The data will also augment
the GNPS database with biological assay binding data. An additional consequence of these goals will be the further
digitization of natural products analytical data so that they can be used in the computational tools planned herein, as
well as other tools in the future. Completion of these four specific aims will create new integrated tools for the precise
identification of new natural product structures, and enable...

## Key facts

- **NIH application ID:** 10211176
- **Project number:** 2R01GM107550-09
- **Recipient organization:** UNIVERSITY OF CALIFORNIA, SAN DIEGO
- **Principal Investigator:** GARRISON W COTTRELL
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $547,104
- **Award type:** 2
- **Project period:** 2013-09-05 → 2025-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10211176

## Citation

> US National Institutes of Health, RePORTER application 10211176, Unified Computation Tools for Natural Products Research (2R01GM107550-09). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10211176. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
