# Mining real-time social media big data to monitor HIV: Development and Ethical Issues

> **NIH NIH R01** · UNIVERSITY OF CALIFORNIA-IRVINE · 2020 · $737,082

## Abstract

Social “big data” holds information with wide-ranging implications for addressing issues along the HIV care
continuum. Social big data refers to information from social media and online platforms on which individuals
and communities create, share, and discuss content. One in four people worldwide, or over a billion people,
are publically documenting their activities, intentions, moods, opinions, and social interactions on these sites.
They are doing so with increasing volume and velocity, including 400 million “tweets” per day on Twitter and
4.75 billion content items shared per day on Facebook. With an increasing number of these platforms
supporting access to publicly-available user data, social big data analysis is a promising new approach for
attaining organic observations of behavior that can be used to monitor and predict real-world public health
problems, such as HIV incidence. New tools such as social data are therefore needed to supplement existing
HIV data collection methods.
 In preliminary research, our team developed the first approach that identifies psychological and behavioral
characteristics from social big data (>550 million tweets) found to be associated with HIV diagnoses. Since
groups at the highest risk for HIV (e.g., minority populations) are the fastest growing Twitter users, and
because social media users have been found to publicly share personal information, we identified and
collected tweets suggesting HIV risk behaviors (e.g., drug use, high-risk sexual behaviors, etc.) and modeled
them alongside CDC statistics on HIV diagnoses. We found a significant positive relationship between HIV-
related tweets and county-level HIV cases, controlling for socioeconomic status measures and other variables.
 The problem is that this approach is not currently scalable for use by HIV researchers and public health
organizations. Although public health agencies are interested in mining social data to address HIV, current
tools are not accessible to most health scientists, as the tools require advanced computer science expertise.
For example, analyzing 500 million tweets a day requires expertise in big data engineering, advanced machine
learning, natural language processing, and artificial intelligence. Developing a single platform for mining social
data that has been designed and tested by and for HIV researchers could provide a significant impact on HIV
prevention, testing, and treatment. We seek to create a single automated platform that collects social media
data; identifies, codes, and labels tweets that suggest HIV-related behaviors; and ultimately predicts regional
HIV incidence. Because of the potential ethical issues associated with mining people's data, we also seek to
interview staff at local and regional HIV organization and participants affected by HIV to gain their perspectives
on the ethical issues associated with this approach. The software developed from this application will be
shared with HIV researchers and health car...

## Key facts

- **NIH application ID:** 9903175
- **Project number:** 5R01AI132030-05
- **Recipient organization:** UNIVERSITY OF CALIFORNIA-IRVINE
- **Principal Investigator:** Sean Young
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $737,082
- **Award type:** 5
- **Project period:** 2017-04-01 → 2023-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9903175

## Citation

> US National Institutes of Health, RePORTER application 9903175, Mining real-time social media big data to monitor HIV: Development and Ethical Issues (5R01AI132030-05). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9903175. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
