Rational prioritization algorithm for docking-based virtual screening of trillion-scale make-on-demand small molecule libraries

NIH RePORTER · NIH · F32 · $73,828 · view on reporter.nih.gov ↗

Abstract

Prioritization algorithms for docking trillion-scale small molecule libraries: Abstract The advent of make-on-demand molecular libraries has significantly expanded the number of readily accessible molecules and enabled the discovery of new chemotypes through structure-based docking. However, the libraries have now grown beyond the ability of even the fastest methods to explicitly dock them – and show no sign of slowdown. Various strategies have been explored to prioritize molecules to dock from large collections, including machine learning (ML) methods. However, applying ML for chemical matter prioritization has limitations. Insufficient experimental data for training means that the models are often trained on docking results, replicating the inaccuracies of the docking algorithm. Additionally, the models may be biased towards specific chemotypes, reducing novelty. Moreover, generating 3D conformations, which is crucial for accurate predictions, poses computational challenges. To overcome these limitations, I propose an iterative prioritization approach for docking trillion-scale libraries. The method involves docking a small random subset of the library to the receptor, focusing on molecules similar to high-scoring compounds from previous rounds while ensuring exploration of new chemical space. This approach significantly reduces the number of explicitly docked molecules while greatly enriching the docked subset with promising candidates. The advantages of this approach include its efficiency, comparable to general ML methods, and its methodological transparency. The specific aims of the study are twofold. First, I will develop and test algorithms for iteratively docking subsets of ultra-large small molecule libraries. Different molecular descriptors and parameters will be retrospectively tested using data from earlier large-scale docking screens from the Shoichet lab. The final algorithm along with the best-performing parameters will be made publicly available. Second, the new methods will be applied to predict and test new ligands, examining the idea that larger libraries lead to the discovery of more potent molecules. Screening the ~60 billion presently available molecules against AmpC β-lactamase will allow comparison with previous docking results, and selected compounds will be experimentally tested. Furthermore, co-crystal structures of the most potent binders identified will be solved. This project addresses the barrier posed by the expanding chemical space and aims to accelerate ligand discovery for the whole community. The feasibility of the approach is supported by promising preliminary results.

Key facts

NIH application ID: 10901111
Project number: 1F32GM154469-01
Recipient: UNIVERSITY OF CALIFORNIA, SAN FRANCISCO
Principal Investigator: Olivier Mailhot
Activity code: F32
Funding institute: NIH
Fiscal year: 2024
Award amount: $73,828
Award type: 1
Project period: 2024-05-01 → 2027-04-30