# Robust Inference from Observational Data with Distributed Representations of Conceptual Relations

> **NIH NIH R01** · UNIVERSITY OF WASHINGTON · 2020 · $505,146

## Abstract

The need to monitor unintended effects of approved drugs has been highlighted by several recent high-profile
events in which fatal side effects of drugs were detected after their release to market. Notoriously, the Cox-2
inhibitor rofecoxib (Vioxx) was withdrawn from market on account of evidence suggesting that treatment with
the drug increased the rate of myocardial infarction. More recently, proton pump inhibitors have been identified
with a host of previously undetected serious side effects, including chronic kidney disease. Statistical analyses
of several sorts of data have been undertaken in an effort to mitigate the morbitidy and mortality resulting from
such side effects by accelerating their detection. These include data from adverse event reporting systems,
Electronic Health Records (EHR) and administrative claims data, social media communication and consumer
search logs. Each of these sources presents challenges related to data completeness, accuracy, quality and
representation, as well as the potential for bias. Though methods for combining multiple data sources show
some promise as a way to address their particular inadequacies, strongly correlated drug-event pairs emerging
from secondary analysis of observational data must ultimately be reviewed by domain experts to assess their
implications. As the availability of the prerequisite expertise is limited, there is a pressing need for new
methods to distinguish plausibly causal relationships from the large number of false positive associations that
may emerge from large-scale analysis of observational data. In the proposed research, we will develop
automated methods through which large amounts of knowledge extracted from the biomedical literature are
used to constrain the parameterization of predictive models of large data sets. These methods will leverage
high-dimensional distributed vector representations of conceptual relations extracted from the literature to
integrate extracted knowledge into predictive models of observational data. Our hypothesis is that the
predictions that result from such joint models will be both biologically plausible and strongly associated,
resulting in more accurate predictions than those that can be obtained through estimation of correlation from
observational data alone. The developed methods will be evaluated formatively for accuracy against a set of
drug/side-effect reference standards, and summatively for their ability to to predict label changes such as
“black box” warnings using historical data and knowledge to estimate their “time-to-detection” of safety
concerns. In addition, we will develop and evaluate an interactive interface permitting users to explore the
evidence used by the resulting models to make predictions, by retrieving supporting assertions from the
literature and statistics from observational data. If successful, the proposed research will provide the means to
identify plausible drug-event pairs for regulatory purposes, mitigating c...

## Key facts

- **NIH application ID:** 9928987
- **Project number:** 5R01LM011563-07
- **Recipient organization:** UNIVERSITY OF WASHINGTON
- **Principal Investigator:** Trevor Cohen
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $505,146
- **Award type:** 5
- **Project period:** 2013-09-01 → 2022-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9928987

## Citation

> US National Institutes of Health, RePORTER application 9928987, Robust Inference from Observational Data with Distributed Representations of Conceptual Relations (5R01LM011563-07). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9928987. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*