Discovering clinical endpoints of toxicity via graph machine learning and semantic data analysis

NIH RePORTER · NIH · K99 · $91,530 · view on reporter.nih.gov ↗

Abstract

Project Summary/Abstract This project proposes the development of new methods and data resources to integrate modern artificial intelligence (AI) techniques into predictive toxicology, as well as the application of those methods and resources to generate new hypotheses linking putative toxicants to specific clinical outcomes. The recent explosion of publicly available chemical and biomedical data provides an immensely valuable resource for computational toxicologists, but existing techniques for learning from these data perform poorly and fail to capture crucial patterns that span multiple levels of biological organization. For example, the US FDA maintains a computational toxicology database cataloguing over 875 thousand chemicals of toxicologic concern, yet only a small handful of these have been characterized in terms of their downstream clinical effects. However, informatics and machine learning (ML) provide specific tools that may solve this issue. This project focuses on 2 of those in particular: Graph machine learning (Graph ML) and semantic data analysis. Since both of these techniques allow for the integration of information from multiple otherwise incongruent sources, they have the capacity to outperform simpler traditional methods for pattern discovery, while increasing both inferential capacity and statistical power. Our central hypothesis is that inductive learning on semantic graph data provides an effective means for generating and validating translational and mechanistic conclusions from existing public toxicology data. In Aim 1 (K99), a new data infrastructure—driven by a large, ontology-controlled graph database aggregating public toxicology data—will be constructed and evaluated on several important tasks in computational toxicology. Together, these resources will be named `ComptoxAI'. Aim 2 (K99) will develop and apply a graph machine learning strategy to predict new adverse outcome pathways (AOPs) in the graph database. Importantly, this aim will use an automated machine learning (Auto ML) approach to discover optimized neural network architectures for this prediction task in a data-driven manner. This Auto ML strategy will use estimation of distribution algorithms (EDAs) to search for optimized network architectures in a probabilistic manner. An expected side effect of the Auto ML approach is increased model interpretability over existing applications of Graph ML. Aim 3 (R00) will use semantic data analysis via ontological inference to refine Aim 2's model outputs into meaningful knowledge, proposing specific mechanistic explanations for the newly proposed AOPs. Aim 4 (R00) will use the resources and outcomes of the previous Aims as a starting point to develop and disseminate new open-source data standards, software resources, and research reporting protocols, with the goal of creating a collaborative, cross-institutional research ecosystem for AI research in computational toxicology. Beyond the methodological and infrastr...

Key facts

NIH application ID
10489356
Project number
5K99LM013646-02
Recipient
UNIVERSITY OF PENNSYLVANIA
Principal Investigator
Joseph Daniel Romano
Activity code
K99
Funding institute
NIH
Fiscal year
2022
Award amount
$91,530
Award type
5
Project period
2021-09-15 → 2022-12-31