Automated Molecular Identity Disambiguator (AutoMID)

NIH RePORTER · NIH · R01 · $293,345 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY Small molecules are one of the most important classes of therapeutics alleviating suffering and in many cases death for hundreds of millions of people worldwide. Small molecules also serve as invaluable tools to study biology, often with the goal to validate novel targets for the development of future therapeutic drugs. Reproducibility of experimental results and the interoperability and reusability of resulting datasets depend on accurate descriptions of associated research objects, and most critically on correct representations of small molecules that are tested in biological assays. For example, it is not possible to develop predictive models of protein target - small molecule interactions if their chemical structure representations are not correct. Many factors contribute to errors in reported chemical structures in small molecule screening and omics reference databases, scientific publications, and many other web-based resources and documents. Because of the complexity of representing small molecules chemical structure graphs and the lack of thorough curation, errors are frequently introduced by non-experts and error propagation across different digital research assets is a pervasive problem. To address this challenging problem via a scalable approach, we propose the Automated Molecular Identity Disambiguator (AutoMID). AutoMID will be usable in batch mode at scale via an API, for example to assist chemical structure standardization and registration by maintainers of digital research assets, and also via interactive (UI) mode for everyday researchers to quickly and easily validate or correct their small molecule representations. AutoMID will leverage extensive highly standardized linked databases of chemical structures and associated information including names, synonyms, biological activity and physical properties and their sources / provenance and leverage expert rules and AI to enable reliable disambiguation of chemical structure identities at scale.

Key facts

NIH application ID
9987129
Project number
1R01LM013391-01
Recipient
UNIVERSITY OF MIAMI SCHOOL OF MEDICINE
Principal Investigator
BARRY A BUNIN
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$293,345
Award type
1
Project period
2020-05-01 → 2024-02-29