EVIDARA: Automated Evidential Support from Raw Data for relay agents in Biomedical KG Queries

NIH RePORTER · NIH · OT2 · $437,326 · view on reporter.nih.gov ↗

Abstract

1) Component: Autonomous Relay Agent. We will develop an ARA named EVIDARA to evaluate returns from queries in knowledge sources (KS) using a new epistemology: The “reasoning” is based on checking against empirical evidence available in raw data (measurements) instead of deductive reasoning (FIG.►). EVIDARA will assist the Autonomous Relay System (ARS) to identify paths in returned knowledge graphs (KG) that may conflict with real-word evidence and to relay queries to appropriate specialty KS or database. (2) Problem addressed: EHR and multi-omics raw data from large cohorts, if properly preprocessed [e.g., by Knowledge Providers, such as the DOCKET, see application by Dr. Glusman], offers a new opportunity for ad hoc systematic extraction of empirical knowledge on relationships (“Protein P level correlates with risk for disease D”) instead of relying on specific epidemiological analyses. The problem in harnessing raw data for empirical support in lieu of deductive reasoning is that the KGs to be evaluated are extracted from knowledge sources of distinct types and that the relevance of paths depends on the query context Q. Also the ARA algorithm should be scalable to digest the emerging multi-omics data from projects like All-of-Us, the UK Biobank. (3) Plan for implementation: Research will be conducted to evaluate a new epistemic realm: make empirical evidence central to “reasoning”. We have assembled a set of functioning tools to overcome the chicken-egg problem of getting a project started and jumpstart development and testing of EVIDARA: (i) SPOKE, one of the largest biomedical knowledge network (KN) has integrated 25 diverse of KS into a single (neo4j) network database of 2 million nodes and will serve as testing ground for research well before we can use KGs produced by the Knowledge Providers. (ii) Algorithms that use raw data from EHR and multi-omics studies to evaluate the returned KGs. For instance, we compute weights of all nodes in the entire KN through a random-walk algorithm biased by their role for a given condition Q observed in the raw data. (iii) Raw data beyond EHR: multi-omics profiles from a study at ISB with >10k variables which vastly exceeds coverage of observable nodes in KNs offered by EHRs. Example query: “Vitamin K stimulates stem-cell signaling, thus could promote cancer. What is the molecular pathway? Mechanisms returned as KG will be pruned by EVIDARA and checked against correlative evidence in the raw data: Is there evidence that taking Vit. K or its antagonist reduces cancer risk?”. Importantly, since EVIDARA learns on a network of many types of KS, it will provide information to the ARS about which type of KS/Knowledge Provider to invoke next (in iterative queries) to improve the knowledge graph. (4) Expertise & resources: The MPIs, Drs. S. Baranzini (UCSF) and S. Huang (ISB) are researchers with long history of working with medical big data, thus offering technical expertise and the critical SME perspective. SB’...

Key facts

NIH application ID
10330633
Project number
3OT2TR003450-01S1
Recipient
INSTITUTE FOR SYSTEMS BIOLOGY
Principal Investigator
SERGIO E BARANZINI
Activity code
OT2
Funding institute
NIH
Fiscal year
2021
Award amount
$437,326
Award type
3
Project period
2020-01-24 → 2022-01-23