# Social Media Mining for Pharmacovigilance

> **NIH NIH R01** · UNIVERSITY OF PENNSYLVANIA · 2021 · $67,501

## Abstract

Project Summary
In our parent grant, the overall goal is to develop novel NLP methods to leverage Social Media (SM) data for
specific pharmacovigilance (PV) efforts that are hindered by known drawbacks of traditional PV approaches.
We focus on methods to facilitate the use of longitudinal SM data for exploring (a) factors affecting medication
adherence and persistence among the general population (Aim 1), and (b) possible associations between
medications taken during pregnancy and pregnancy outcomes (Aim 2). We also address a major roadblock for
case-control studies based on this data: finding controls (Aim 3). In general, the methodological focus of the
parent grant is on advancing NLP methods that enable the large-scale integration of SM as a content-rich
source of user-generated data in epidemiological studies, including relevant data posted over time. We are
seeking to extend methods developed for Aims 1, 2, and 3 of the parent grant to focus on Alzheimer's Disease
and Alzheimer's Disease Related Dementia (AD/ADRD). The Objectives for this supplement are responsive to
NOT-AG-20-034, and include the following: Objective 1 is to develop and evaluate NLP methods to identify
and characterize a cohort of Twitter users affected by an AD/ADRD diagnosis, specifically, Twitter users who
declare that their direct relatives (parents, grandparents, or siblings), significant others, or even themselves,
have been diagnosed with AD/ADRD. The collected timelines of users in the cohort will be further mined for
demographic characterization (age, gender, ethnicity, and geographic location of residence) and specific
methods will be developed to extract familial relationship to the AD/ADRD patient, and information relevant to
two specific focus studies: 1) a study on early onset (also known as young onset) and potential lifetime factors
reflected in SM and 2) study of medications taken by users in the whole cohort, with specific attention to
neuropsychiatric symptoms also associated with AD/ADRD. Objective 2 is the development and evaluation of a
linguistic model that can automatically classify patients' written productions in SM to determine potential
cognitive impairment by analyzing postings over time on SM by people diagnosed with EOAD for signs of
cognitive impairment evident in the content, grammar, and form of their narrative text. In addition to methods
and systems already developed under the Parent Grant, the tasks leverage baseline approaches as well as
our own collection of Tweets. These resources put us in the unique position to complete research approaches
that would normally take several years well within the span of the year covered by the administrative
supplement, allowing us to quickly ramp up our collaborations and research for further proposal submissions to
the NIA or NIGMH.

## Key facts

- **NIH application ID:** 10289130
- **Project number:** 3R01LM011176-08S2
- **Recipient organization:** UNIVERSITY OF PENNSYLVANIA
- **Principal Investigator:** GRACIELA GONZALEZ HERNANDEZ
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $67,501
- **Award type:** 3
- **Project period:** 2012-09-10 → 2022-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10289130

## Citation

> US National Institutes of Health, RePORTER application 10289130, Social Media Mining for Pharmacovigilance (3R01LM011176-08S2). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10289130. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
