# A Web-Based Automatic Virtual Screening System

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA, SAN FRANCISCO · 2024 · $339,150

## Abstract

PROJECT SUMMARY / ABSTRACT
 A long-term goal is to bring small molecules to biologists and chemical biologists, developing easy-to-use
tools and libraries that rapidly identify reagents. A second goal uses these libraries and tools to predict
biological activity for key compound classes, advancing the science and demonstrating proof-of-concept.
 The tools introduced by this research program have become central to virtual screening. The ZINC database
is the most widely used compound library in the field, while our DUD and DUD-E benchmarks are ubiquitous in
virtual screening. Recently, our development of ultra-large libraries has been embraced by the field. The
Similarity Ensemble Approach (SEA) brings chemoinformatic target prediction to a large community, and we
have used it to predict drug off-targets, their side effects, and the activities of supposedly inert molecules.
Here we extend both projects, further developing community libraries and tools in aim 1, applying these to the
prediction of biological activities in aim 2. The specific aims are:
 Aim 1. New tools to bring chemistry to biology. An exciting result of the last period was the introduction of
ultra-large libraries. While an accessible library of >20 billion molecules has expanded our horizons, the two
component reactions from which they derive are inevitably limiting. We will A. develop a “chemistry commons”
of more elaborate virtual molecules available from academic labs, testing them in aim 2, B. expand the
chemistry available for covalent docking to develop new community-accessible libraries of selective
electrophiles for covalent inhibitor discovery, C. We will optimize the widely-used DUDE benchmarks,
introducing new subsets to address the biases that they certainly still retain. D. We will integrate into ZINC
methods that enable similarity searches for analogs in sublinear time.
 Aim 2. Libraries of high value compounds, and their activities. We will A. test the utility of more elaborate
virtual libraries from aim 1 where they are experimentally tested, B. test the new covalent electrophilic libraries
in docking campaigns against SARS-2 relevant proteases 3CLPro and TMPRSS2. C. expand our interest in
target discovery by chemoinformatics, focusing on compounds that are widely used in biology because they
are inactive: drug excipients and Generally Regarded As Safe food additives. D. ask whether GRAS
molecules have on-target pharmacology, as we found with drug excipients, testing our predictions
experimentally.
 Whereas these goals are ambitious, extensive preliminary results support their feasibility.

## Key facts

- **NIH application ID:** 10818367
- **Project number:** 5R01GM071896-19
- **Recipient organization:** UNIVERSITY OF CALIFORNIA, SAN FRANCISCO
- **Principal Investigator:** John J. Irwin
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $339,150
- **Award type:** 5
- **Project period:** 2004-08-01 → 2026-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10818367

## Citation

> US National Institutes of Health, RePORTER application 10818367, A Web-Based Automatic Virtual Screening System (5R01GM071896-19). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10818367. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
