# Ultra-large library docking for ligand discovery

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA, SAN FRANCISCO · 2021 · $389,285

## Abstract

PROJECT SUMMARY / ABSTRACT
 Despite much interest in expanding chemical space, diverse, billion molecule libraries remain inaccessible. In
principle, docking a virtual library could access some of this missing chemical space. This idea has until now
been vitiated by two key problems: 1. prediction of readily synthesized molecules has been challenging, without
resorting to strategies that collapse diversity; and 2. docking is notoriously inaccurate. Two recent advances
have made virtual library docking screens seem less fanciful. First, our collaborators at Enamine, a widely used
fine chemicals supplier, have defined a 0.7 billion molecule make-on-demand library based on >100 reactions
that they have under good control; >650,000 of these have been successfully synthesized. Second, while
docking retains serious errors, it has made pragmatic progress, and has found genuinely novel ligands for >100
targets. The specific aims are:
 Aim 1. A robust, searchable, and dockable database of 3 billion diverse lead-like molecules. We will A.
Enumerate 3 billion vetted products from two- and three-component reactions. B. measure the diversity and
novelty of this library and how they differ from the world's in-stock molecules. C. Develop a community accessible
database and chemoinformatics infrastructure that can store, similarity search, and rapidly retrieve molecules
from this library. D. convert these molecules into biologically relevant 3D forms, including enumerating low-
energy conformers, partial atomic charges and other parameters, van der Waals parameters and solvation
energies for all library molecules, enabling their use for docking screens.
 Aim 2. Dock and experimentally test the library against two targets. A. Screen the library against the dopamine
D4 and kappa-opioid receptors, seeking novel ligands. 250 to 500 library molecules will be tested per screen,
itself a 10-fold increase. A key question will be do we find novel, potent ligands, or are we overwhelmed by false
positives? B. As the library grows, do we continue to find ever more novel, in some sense ever more perfect,
high affinity ligands, or does discovery saturate? C. How does hit rate vary with docking score? As we will be
testing hundreds of molecules, we can afford to investigate not only those with the highest docking ranks, but
also molecules with mediocre and poor ranks. This has not been previously explored, certainly not at scale.
 If successful, this project will increase the number of molecules available to the community by 1000-fold, and
demonstrate their utility for ligand discovery. Extensive preliminary results support its feasibility.

## Key facts

- **NIH application ID:** 10240701
- **Project number:** 5R01GM133836-03
- **Recipient organization:** UNIVERSITY OF CALIFORNIA, SAN FRANCISCO
- **Principal Investigator:** John J. Irwin
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $389,285
- **Award type:** 5
- **Project period:** 2019-09-27 → 2023-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10240701

## Citation

> US National Institutes of Health, RePORTER application 10240701, Ultra-large library docking for ligand discovery (5R01GM133836-03). Retrieved via AI Analytics 2026-05-21 from https://api.ai-analytics.org/grant/nih/10240701. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*