# Rational prioritization algorithm for docking-based virtual screening of trillion-scale make-on-demand small molecule libraries

> **NIH NIH F32** · UNIVERSITY OF CALIFORNIA, SAN FRANCISCO · 2024 · $73,828

## Abstract

Prioritization algorithms for docking trillion-scale small molecule libraries: Abstract
The advent of make-on-demand molecular libraries has significantly expanded the number of readily accessible
molecules and enabled the discovery of new chemotypes through structure-based docking. However, the
libraries have now grown beyond the ability of even the fastest methods to explicitly dock them – and show no
sign of slowdown.
Various strategies have been explored to prioritize molecules to dock from large collections, including machine
learning (ML) methods. However, applying ML for chemical matter prioritization has limitations. Insufficient
experimental data for training means that the models are often trained on docking results, replicating the
inaccuracies of the docking algorithm. Additionally, the models may be biased towards specific chemotypes,
reducing novelty. Moreover, generating 3D conformations, which is crucial for accurate predictions, poses
computational challenges.
To overcome these limitations, I propose an iterative prioritization approach for docking trillion-scale libraries.
The method involves docking a small random subset of the library to the receptor, focusing on molecules similar
to high-scoring compounds from previous rounds while ensuring exploration of new chemical space. This
approach significantly reduces the number of explicitly docked molecules while greatly enriching the docked
subset with promising candidates.
The advantages of this approach include its efficiency, comparable to general ML methods, and its
methodological transparency. The specific aims of the study are twofold. First, I will develop and test algorithms
for iteratively docking subsets of ultra-large small molecule libraries. Different molecular descriptors and
parameters will be retrospectively tested using data from earlier large-scale docking screens from the Shoichet
lab. The final algorithm along with the best-performing parameters will be made publicly available.
Second, the new methods will be applied to predict and test new ligands, examining the idea that larger libraries
lead to the discovery of more potent molecules. Screening the ~60 billion presently available molecules against
AmpC β-lactamase will allow comparison with previous docking results, and selected compounds will be
experimentally tested. Furthermore, co-crystal structures of the most potent binders identified will be solved.
This project addresses the barrier posed by the expanding chemical space and aims to accelerate ligand
discovery for the whole community. The feasibility of the approach is supported by promising preliminary results.

## Key facts

- **NIH application ID:** 10901111
- **Project number:** 1F32GM154469-01
- **Recipient organization:** UNIVERSITY OF CALIFORNIA, SAN FRANCISCO
- **Principal Investigator:** Olivier Mailhot
- **Activity code:** F32 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $73,828
- **Award type:** 1
- **Project period:** 2024-05-01 → 2027-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10901111

## Citation

> US National Institutes of Health, RePORTER application 10901111, Rational prioritization algorithm for docking-based virtual screening of trillion-scale make-on-demand small molecule libraries (1F32GM154469-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10901111. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
