# Mining Social Media Big Data for Toxicovigilance: Automating the Monitoring of Prescription Medication Abuse via Natural Language Processing and Machine Learning Methods

> **NIH NIH R01** · EMORY UNIVERSITY · 2020 · $332,269

## Abstract

Project Summary
The problem of prescription medication (PM) abuse has reached epidemic proportions in the
United States. According to a 2014 report by the Director of the National Institute on Drug
Abuse (NIDA), an estimated 52 million people, have been involved in the non-medical use of
PMs— a significant portion of which can be classified as abuse. PMs that are commonly abused
include opioids, central nervous system depressants and stimulants, and the consequences of
their abuse may be severe. Increases in PM misuse and abuse over the last 15 years have resulted
in increased emergency department visits, rates of addiction and overdose deaths. Due to the
rapidly escalating morbidity and mortality, it is now receiving national attention. The opioid
crisis, which has its root in opioid-based PM abuse, has been declared a national emergency by
the president of the United States. Despite the problems associated with PM abuse, surveillance
programs such as prescription drug monitoring programs (PDMPs) are inadequate and suffer
from numerous shortcomings, thus limiting their usefulness in real life. Studies evaluating the
long-term effects of distinct classes of PMs on cohorts of abusers are scarce and expensive to
conduct. To better characterize the problem and to monitor it in real-time, new sources of
information need to be identified and novel monitoring techniques need to be developed. To
address these problems, our project aims to utilize social media data for performing
toxicovigilance. Social media encapsulates an abundance of knowledge about PM abuse and the
abusers in the form of noisy natural language text. At the heart of the proposed approach is a
machine learning system that can automatically distinguish between `abuse' and `non-abuse'
indicating user posts collected from social media. Using this classification system, users will be
categorized into multiple groups—(i) abusers, (ii) medical users and (iii) non users. The
developed system will collect longitudinal data for users exposed the selected PMs via periodic
collection of their publicly available posts/discussions and automatically categorize them based
on age, gender and additional demographic feature, when possible. This will enable the
conducting of observational studies on targeted cohorts, involving hundreds of thousands of
cohort members. The cohort studies will focus on analyzing the transition rates from medical
use to abuse for distinct PMs and transition rates from abuse of PMs to illicit analogs.
Implementation of this data-centric framework, which will be open source, will revolutionize the
mechanism by which PM abuse monitoring is performed and enable the future development of
intervention strategies targeted towards specific cohorts, at the most effective time periods.

## Key facts

- **NIH application ID:** 9933852
- **Project number:** 5R01DA046619-04
- **Recipient organization:** EMORY UNIVERSITY
- **Principal Investigator:** Abeed H Sarker
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $332,269
- **Award type:** 5
- **Project period:** 2019-09-01 → 2022-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9933852

## Citation

> US National Institutes of Health, RePORTER application 9933852, Mining Social Media Big Data for Toxicovigilance: Automating the Monitoring of Prescription Medication Abuse via Natural Language Processing and Machine Learning Methods (5R01DA046619-04). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/9933852. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
