Robust Inference from Observational Data with Distributed Representations of Conceptual Relations

NIH RePORTER · NIH · R01 · $505,146 · view on reporter.nih.gov ↗

Abstract

The need to monitor unintended effects of approved drugs has been highlighted by several recent high-profile events in which fatal side effects of drugs were detected after their release to market. Notoriously, the Cox-2 inhibitor rofecoxib (Vioxx) was withdrawn from market on account of evidence suggesting that treatment with the drug increased the rate of myocardial infarction. More recently, proton pump inhibitors have been identified with a host of previously undetected serious side effects, including chronic kidney disease. Statistical analyses of several sorts of data have been undertaken in an effort to mitigate the morbitidy and mortality resulting from such side effects by accelerating their detection. These include data from adverse event reporting systems, Electronic Health Records (EHR) and administrative claims data, social media communication and consumer search logs. Each of these sources presents challenges related to data completeness, accuracy, quality and representation, as well as the potential for bias. Though methods for combining multiple data sources show some promise as a way to address their particular inadequacies, strongly correlated drug-event pairs emerging from secondary analysis of observational data must ultimately be reviewed by domain experts to assess their implications. As the availability of the prerequisite expertise is limited, there is a pressing need for new methods to distinguish plausibly causal relationships from the large number of false positive associations that may emerge from large-scale analysis of observational data. In the proposed research, we will develop automated methods through which large amounts of knowledge extracted from the biomedical literature are used to constrain the parameterization of predictive models of large data sets. These methods will leverage high-dimensional distributed vector representations of conceptual relations extracted from the literature to integrate extracted knowledge into predictive models of observational data. Our hypothesis is that the predictions that result from such joint models will be both biologically plausible and strongly associated, resulting in more accurate predictions than those that can be obtained through estimation of correlation from observational data alone. The developed methods will be evaluated formatively for accuracy against a set of drug/side-effect reference standards, and summatively for their ability to to predict label changes such as “black box” warnings using historical data and knowledge to estimate their “time-to-detection” of safety concerns. In addition, we will develop and evaluate an interactive interface permitting users to explore the evidence used by the resulting models to make predictions, by retrieving supporting assertions from the literature and statistics from observational data. If successful, the proposed research will provide the means to identify plausible drug-event pairs for regulatory purposes, mitigating c...

Key facts

NIH application ID
9928987
Project number
5R01LM011563-07
Recipient
UNIVERSITY OF WASHINGTON
Principal Investigator
Trevor Cohen
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$505,146
Award type
5
Project period
2013-09-01 → 2022-05-31