Machine learning approaches for the discovery, repurposing, and optimization of natural products with therapeutic potential

NIH RePORTER · NIH · R35 · $78,717 · view on reporter.nih.gov ↗

Abstract

Project Summary Natural products from bacteria, fungi, and plants have long been a rich source of useful molecules. However, due to their complex structures, it is difficult to screen many analogs of natural products to truly understand the rules governing the relationship between their structure and activity. We will address this challenge by developing machine learning methods that can functionally model the structure-activity relationships (SAR) of natural products and aid in the design of biosynthetic pathways that can synthesize natural product analogs. Therefore, we will develop methods both for prioritizing natural products that are most likely to be useful as therapeutics for activity screens and for biosynthesizing natural products of interest. Machine learning is a powerful computational technique that enables computers to make inferences from data. There is a wealth of sequence, structure, and activity data available for biological molecules that we can use to build machine learning models to make predictions about the behavior of biochemical systems. Even machine learning algorithms that are not perfectly accurate can be extremely useful for drug discovery efforts. It is possible to screen orders of magnitude more compounds using machine learning than in high-throughput screens. Machine learning can therefore be used as an initial filter to increase hit rates in screens. Our first project will apply machine learning to study natural product SARs. We will take two approaches, a genetic and chemical structure approach. In the genetic approach we will validate correlations between biosynthetic genes and natural product activity that we have previously observed and confirm that the correlation extends to chemical substructures installed by the biosynthetic genes. In the chemical structure approach, we will investigate the ability of graph neural networks to predict natural product properties. Our second project will focus on developing machine learning and other computational tools for designing biosynthetic gene clusters (BGCs) to biosynthesize novel natural product-like molecules. We will first focus on Ribosomally Synthesized and Posttranslationally modified Peptides (RiPPs) and develop methods to predict compatible modifying enzyme-leader peptide pairs. To do this we will use molecular modeling, Statistical Coupling Analysis (SCA), and machine learning. After validating our methods on RiPPs, we will turn our attention to more difficult classes of BGCs, such as nonribosomal peptide synthetases (NRPS) and polyketide synthases (PKS). Our third project is the development of methods for designing RiPP-based protein-protein interaction (PPI) inhibitors. We will develop both molecular modeling and machine learning methods for predicting optimal RiPP sequences for inhibiting a PPI of interest. We will then validate and collect additional training data for these predictions using directed evolution experiments.

Key facts

NIH application ID: 11168247
Project number: 3R35GM146987-03S1
Recipient: VANDERBILT UNIVERSITY
Principal Investigator: Allison Sara Walker
Activity code: R35
Funding institute: NIH
Fiscal year: 2024
Award amount: $78,717
Award type: 3
Project period: 2022-09-01 → 2027-08-31