# Diversity Supplement for Mining Social Media Big Data for Toxicovigilance

> **NIH NIH R01** · EMORY UNIVERSITY · 2024 · $82,238

## Abstract

Project Summary
The epidemic of substance use (SU) and substance use disorder (SUD) in the United States has been evolving
for decades. Both prescription and illicit drugs have been involved in overdose deaths over the years, with
notable increases in synthetic opioids (e.g., fentanyl & analogs) and psychostimulants (e.g., methamphetamine)
in recent years. The emergence of high-potency novel psychoactive substances (NPSs), such as fentanyl
analogs, have drastically contributed to rising deaths, and adversely impacted treatment engagement and
response. The COVID19 pandemic has further exacerbated the crisis, and recent studies have also highlighted
that substantial disparities exist in SUD treatment, research, interest, and response across different
subpopulations, with racial/ethnic minorities being disproportionately impacted. A key element to tackling the
crisis is improved surveillance. Specifically, there is a need for establishing novel approaches to provide timely
insights about the trends, distributions, and trajectories of the SUD epidemic, as traditional surveillance
approaches involve considerable lags. Many recent studies have identified social media (SM) as useful
resources for conducting SU/SUD surveillance. Many people use SM to discuss personal experiences, provide
advice, or seek answers to questions regarding SU/SUD, resulting in the generation of an abundance of
information. Such information can be characterized, aggregated and analyzed to obtain population- or
subpopulation-level insights, at low cost and in near real time. However, converting SM data into timely,
actionable knowledge is non-trivial since the data is big, complex, and noisy, requiring the development of
advanced, automated artificial intelligence methods. Funded by the National Institute on Drug Abuse, our past
work focused specifically on prescription medications (PM) and established the most sophisticated SM-based
data mining pipeline available to date. In the parent proposal, we are expanding our pipeline to attempt to solve
previously unaddressed problems including (i) detection of novel psychoactive substances—both prescription
and illicit, (ii) characterizing stigmatizing language, and studying long-term trends in the impact of substance use.
In the proposed supplement, we will focus on studying a specific substance that has emerged to be a national
problem—xylazine. The two specific aims of the supplement are as follows: (i) Characterize and quantify the
reported adverse effects and clinical and social impacts of xylazine, leveraging the state-of-the-art NLP and
machine learning methods and close-to-real-time big data from Twitter (X) and Reddit; and (ii) Analyze the impact
of xylazine over time across geographic locations and different population groups. The supplement provides
training on advanced data science, including machine learning, natural language processing (NLP), generative
AI and large language models (LLMs), and the applications of these ar...

## Key facts

- **NIH application ID:** 11053091
- **Project number:** 3R01DA057599-01S1
- **Recipient organization:** EMORY UNIVERSITY
- **Principal Investigator:** Abeed H Sarker
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $82,238
- **Award type:** 3
- **Project period:** 2022-09-30 → 2025-09-29

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/11053091

## Citation

> US National Institutes of Health, RePORTER application 11053091, Diversity Supplement for Mining Social Media Big Data for Toxicovigilance (3R01DA057599-01S1). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/11053091. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
