Centralized assay datasets for modelling support of small drug discovery organizations

NIH RePORTER · NIH · R44 · $855,127 · view on reporter.nih.gov ↗

Abstract

Project Summary Collaborations Pharmaceuticals, Inc. was formed after identifying a need for software to assist academics and smaller companies in curating their data and discovery of new hits or lead optimisation. In the past two years the continued importance of artificial intelligence (AI) is apparent from the explosive growth in number of these companies and the increasing number of multi-million dollar deals with pharma using Machine Learning (ML) to assist in drug discovery. There is a heavy focus by these companies on the drug discovery modeling aspect but there is a continued unmet need and bottleneck in the curation of quality in vitro and in vivo data ADME/Tox data for ML as well as prospective testing to validate the technologies. In Phase I, we developed a prototype of Assay CentralÒ software and used this with a wide variety of structure activity data from sources both public and private, formatted and unformatted, with ~14 collaborators working on neglected, rare or common disease targets as well as used it for our internal drug discovery projects. In Phase I we also created error checking and correction software. We also built and validated Bayesian models with the datasets that were collected and cleaned. And, in addition, we developed new data visualization tools. The software can be used to create selections of these models for sharing with collaborators as needed and for scoring new molecules and visualizing the multiple outputs in various formats. In Phase II, we have developed Assay CentralÒ into a production tool which is easy to deploy, built on industry standard technologies, provided graphical display of models and information on model applicability. Importantly, we identified that customers wanted us to provide them with the results! We developed our fee-for-service consulting services model using Assay CentralÒ to solve their problems and this has expanded our revenues annually. In Phase II we evaluated additional ML algorithms and molecular descriptors with manually curated datasets as well as compared algorithms across over 5000 auto-curated datasets from ChEMBL. This illustrated the utility of access to multiple algorithms and how the Bayesian algorithm was generally comparable to these other ML algorithms. This also motivated us to develop new software to integrate these algorithms. We have also explored finding rare disease datasets and applying our data curation and ML approach to them. With these and additional collaborations, as well as internal projects on Alzheimer’s disease (through a NIH NIGMS supplement) we have been able to repurpose already approved drugs for several targets for this and other diseases. For multiple projects we have performed several rounds of model building and fed data back into the models to enable improved predictions. Finally, we have developed prototype tools to enable us to develop automated molecule designs, assess their synthesizability and perform retrosynthetic analysis. These ...

Key facts

NIH application ID: 10321747
Project number: 2R44GM122196-04A1
Recipient: COLLABORATIONS PHARMACEUTICALS, INC.
Principal Investigator: SEAN EKINS
Activity code: R44
Funding institute: NIH
Fiscal year: 2021
Award amount: $855,127
Award type: 2
Project period: 2017-01-01 → 2023-07-31