# MegaPredict for predicting natural product uses and their drug interactions

> **NIH NIH R43** · COLLABORATIONS PHARMACEUTICALS, INC. · 2020 · $155,686

## Abstract

Project Summary
The objective of ‘MegaPredict’ is to enable scientists to generate predictions for a natural product (or any
molecule) and identify targets for efficacy assessment as well as identify any potential liabilities. We are building
on our previous work which has compiled a comprehensive collection of datasets for structure-activity data for a
broad variety of disease targets and other properties, in a form ready for model building. All of these models
utilize the many sources of curated open data, including ChEMBL, ToxCast etc. We have developed a prototype
of MegaPredict that utilizes Bayesian algorithm and ECFP6 fingerprints to output a list of prioritized ‘targets’. We
realize that neither the algorithm or the descriptors may be optimal therefore we propose to address this as we
validate MegaPredict and develop a product over this proposal. Our team is suitably qualified to develop the
software needed and we will leverage our large collaborator network to assist us in validating the activity of
compounds.
We will initially create a script to take a natural product and score it against many thousands of machine
learning models then rank the outputs to propose efficacy targets. We will use over 12,000 ChEMBL
derived target-assay / bioactivity groups extracted from the ChEMBL v24 database, as well as EPA Tox21
measurements and other public datasets, using methodology that we have already partially developed. We can
repeat this process for over 200 published compounds and access the outputs versus what is known. We intend
to compare how the approach performs with synthetic drugs or drug-like compounds as well as natural products.
We will assess whether other machine learning algorithms and molecular descriptors can improve
predictions. As we generate machine learning models such as Linear Logistic Regression, AdaBoost Decision
Tree, Random Forest, Support Vector Machine and deep neural networks (DNN) of varying depth we will assess
the predictions for natural products and compare with the Bayesian approach. We will compare ECFP6 with
other 2D, 3D descriptors and physicochemical properties in order to identify the optimal combination for
generating predictions for natural products and compare how this differs for synthetic compounds.
We will validate our predictions for natural product efficacy assessment. We will work closely with multiple
academic groups to generate predictions for at least 20 natural products of interest against over 20 different
targets or diseases. Our goal will be to identify potential targets that were previously unknown and then generate
in vitro data inhouse or with academic collaborators.
Develop a prototype user interface for input of a structure, processing an input molecule and output of
prioritized targets and liabilities. We have developed multiple software prototypes (e. Assay Central, MegaTox,
etc.) previously and will ensure a user-friendly interface and develop new visualization methods and algorithms
...

## Key facts

- **NIH application ID:** 10055938
- **Project number:** 3R43AT010585-01S1
- **Recipient organization:** COLLABORATIONS PHARMACEUTICALS, INC.
- **Principal Investigator:** SEAN EKINS
- **Activity code:** R43 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $155,686
- **Award type:** 3
- **Project period:** 2019-08-15 → 2021-08-14

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10055938

## Citation

> US National Institutes of Health, RePORTER application 10055938, MegaPredict for predicting natural product uses and their drug interactions (3R43AT010585-01S1). Retrieved via AI Analytics 2026-06-02 from https://api.ai-analytics.org/grant/nih/10055938. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
