# Using big data and deep learning on predicting HIV transmission risk in MSM population

> **NIH NIH R56** · UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON · 2020 · $801,194

## Abstract

Project Summary
An important global public health priority is to develop new methods for identifying populations at greatest HIV
risk, understanding HIV transmission network patterns, and intervening to reduce network risk. HIV testing is
important to effect positive sexual behavior changes, and is an entry point to treatment, care, and psychosocial
support. At the end of 2016, an estimated 1.1 million persons aged 13 and older were living with HIV infection
in the United States, including an estimated 162,500 (14%) persons whose infections had not been diagnosed.
In addition, many persons with HIV are tested late in the course of infection. Late testing results in missed
opportunities for prevention and treatment of HIV, and increased risk for transmission to their partners.
Current status – A number of epidemiological studies have employed social network theory/concepts and
applied network analytical techniques to examine the structural characteristics of HIV transmission networks
through phylogenetic link (HIV-1 pol sequences) and/or sexual/social/drug-using contacts among MSM. These
studies, however, usually reduce the network information to summary information, consider only a subset of
network variables, and/or use one layer of multi-dimension networks determining transmission paths such as
only the social, sexual, contact, and venue perspectives.
Challenges: The complexity of data that is important for HIV infection risk analysis makes it challenging to
conduct risk and transmission prediction. More specifically, we are facing two challenges: (1) How to develop a
mechanism to faithfully and flexibly represent the multi-dimensional network data collected from different
sources at different time periods; (2) Once the data has been integrated, how to fully leverage the data to
develop a risk prediction algorithm that considers the multi-dimensional networks with substantially interrelated
factors in a comprehensive manner.
Goals - We hypothesize that deep learning-based informatics approaches can provide a novel way for HIV
infection risk prediction. In close collaboration with the public health department, we will construct a
comprehensive framework that combines population-based molecular, behavior, and contact/partner tracing
information including venue affiliation data and individual sex and drug-using behaviors, as well as existing
locally collected cohort data. Using this dynamically collected data we will then develop practical deep-learning
algorithms that leverage the comprehensive framework for cluster growth and identifying newly infected
population. This proposal focuses specifically on ongoing epidemic growth among populations most at risk,
including young men who have sex with men (MSM), which remain highly vulnerable to HIV in the U.S.

## Key facts

- **NIH application ID:** 10234761
- **Project number:** 1R56AI150272-01A1
- **Recipient organization:** UNIVERSITY OF TEXAS HLTH SCI CTR HOUSTON
- **Principal Investigator:** Kayo Fujimoto
- **Activity code:** R56 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $801,194
- **Award type:** 1
- **Project period:** 2020-09-01 → 2022-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10234761

## Citation

> US National Institutes of Health, RePORTER application 10234761, Using big data and deep learning on predicting HIV transmission risk in MSM population (1R56AI150272-01A1). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10234761. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
