xARA: ARA through Explainable AI

NIH RePORTER · NIH · OT2 · $655,783 · view on reporter.nih.gov ↗

Abstract

In response to the NIH FOA OTA-19009 “Biomedical Translator: Development” we propose to build an Autonomous Relay Agent (ARA) that can characterize and rate the quality of information returned from multiple multiscale heterogeneous knowledge providers (KPs). Biomedical researchers develop a trust relationship with a knowledge provider (KP) through frequent and continued use. Over time a familiarity develops that drives their understanding and insight on 1) how to structure and invoke more effective queries, 2) the quality of the results they may expect in response to different query parameters and feature values, and 3) how to assess the relevancy of a specific query’s results. Although this information retrieval paradigm has served the research community moderately well in the past it is not scalable and the number, scope and complexity of KPs is increasing at a dramatic pace (1,613 molecular biology databases reported as of Jan. 2019). Within this ever changing information landscape, a biomedical researcher now has two choices -- either continue using the few KPs they have learned to trust but remain limited in the actionable information they will receive, or invest the time and accept the risk of using a range of new information resources with little or no familiarity and thus uncertain effectiveness. If researchers are to benefit from the vast array of NIH and industry sponsored information assets now available and expanding new information retrieval and quality assessment technologies will be required. We propose to build an Explanatory Autonomous Relay Agent (xARA) that can characterize query results by rating the quality of information returned from multi-scale heterogeneous KPs. The xARA will utilize multiple information retrieval and explainable Artificial Intelligence (xAI) strategies to perform queries across multiple heterogeneous KPs and rank their results by quality and relevancy while also identifying and explaining any inconsistencies among databases for the same query response. To deliver on this promise, we will utilize case-based reasoning and language models trained with biomedical data (i.e., BioBERT and custom annotation embeddings through Reactome and UniProt) permitting a new level of query profiling and assessment. Our strategies will permit 1) information gaps to be filled by testing alternative query patterns that produce different surface syntax yet possess semantically related and actionable concepts, 2) inconsistencies to be identified for a given query feature value, and 3) the identification and elimination or merging of semantically redundant query results via similarity metrics enriched by case-based reasoning strategies employed in the explainable AI (xAI) community to identify machine learning model behavior and performance. The xARA capabilities proposed herein will be based on strategies developed in Dr. Weber’s lab for information retrieval where the desire for greater transparency when reasoning over experi...

Key facts

NIH application ID: 10547257
Project number: 3OT2TR003448-01S2
Recipient: TUFTS MEDICAL CENTER
Principal Investigator: Joseph Gormley
Activity code: OT2
Funding institute: NIH
Fiscal year: 2022
Award amount: $655,783
Award type: 3
Project period: 2020-01-24 → 2022-11-30