Diversity Supplement for Mining Social Media Big Data for Toxicovigilance

NIH RePORTER · NIH · R01 · $82,238 · view on reporter.nih.gov ↗

Abstract

Project Summary The epidemic of substance use (SU) and substance use disorder (SUD) in the United States has been evolving for decades. Both prescription and illicit drugs have been involved in overdose deaths over the years, with notable increases in synthetic opioids (e.g., fentanyl & analogs) and psychostimulants (e.g., methamphetamine) in recent years. The emergence of high-potency novel psychoactive substances (NPSs), such as fentanyl analogs, have drastically contributed to rising deaths, and adversely impacted treatment engagement and response. The COVID19 pandemic has further exacerbated the crisis, and recent studies have also highlighted that substantial disparities exist in SUD treatment, research, interest, and response across different subpopulations, with racial/ethnic minorities being disproportionately impacted. A key element to tackling the crisis is improved surveillance. Specifically, there is a need for establishing novel approaches to provide timely insights about the trends, distributions, and trajectories of the SUD epidemic, as traditional surveillance approaches involve considerable lags. Many recent studies have identified social media (SM) as useful resources for conducting SU/SUD surveillance. Many people use SM to discuss personal experiences, provide advice, or seek answers to questions regarding SU/SUD, resulting in the generation of an abundance of information. Such information can be characterized, aggregated and analyzed to obtain population- or subpopulation-level insights, at low cost and in near real time. However, converting SM data into timely, actionable knowledge is non-trivial since the data is big, complex, and noisy, requiring the development of advanced, automated artificial intelligence methods. Funded by the National Institute on Drug Abuse, our past work focused specifically on prescription medications (PM) and established the most sophisticated SM-based data mining pipeline available to date. In the parent proposal, we are expanding our pipeline to attempt to solve previously unaddressed problems including (i) detection of novel psychoactive substances—both prescription and illicit, (ii) characterizing stigmatizing language, and studying long-term trends in the impact of substance use. In the proposed supplement, we will focus on studying a specific substance that has emerged to be a national problem—xylazine. The two specific aims of the supplement are as follows: (i) Characterize and quantify the reported adverse effects and clinical and social impacts of xylazine, leveraging the state-of-the-art NLP and machine learning methods and close-to-real-time big data from Twitter (X) and Reddit; and (ii) Analyze the impact of xylazine over time across geographic locations and different population groups. The supplement provides training on advanced data science, including machine learning, natural language processing (NLP), generative AI and large language models (LLMs), and the applications of these ar...

Key facts

NIH application ID
11053091
Project number
3R01DA057599-01S1
Recipient
EMORY UNIVERSITY
Principal Investigator
Abeed H Sarker
Activity code
R01
Funding institute
NIH
Fiscal year
2024
Award amount
$82,238
Award type
3
Project period
2022-09-30 → 2025-09-29