# Clinical Phenotyping for Prediction of Retention in HIV Care

> **NIH NIH R21** · UNIVERSITY OF CHICAGO · 2023 · $452,376

## Abstract

Retention in care is essential to HIV treatment and prevention, yet only half of people with HIV in the U.S. are
retained in medical care. Improving retention is critical for ending the HIV epidemic in the U.S., but effective
retention interventions are highly resource intensive. With diminishing resources for HIV care and increasing
prevalence of HIV, better approaches are needed to assist HIV care teams to identify patients most vulnerable
to loss to follow-up (LTFU) who would most benefit from retention resources before LTFU occurs. A predictive
model of LTFU based on electronic health data has the potential to address this need, as it quantifies a specific
patient’s risk of future disengagement from care based on his/her unique characteristics and can be automated
to generate risk prediction in real time.
Using data from an urban HIV clinic, we have developed a machine learning model to predict LTFU from HIV
care using natural language processing (NLP) of unstructured text of provider notes in the electronic medical
record (EMR). The NLP model demonstrated good performance in detecting patients at risk for LTFU with a
positive predictive value (PPV) of 0.86, and identified word patterns associated with LTFU, such as “substance
abuse” and “stigma,” thereby demonstrating good face validity.
While our preliminary data reveal the potential of NLP-based machine learning models to predict future
retention in care, several key issues need to be addressed before the model can be deployed for patient care.
First, PWH are a markedly heterogeneous population, and it is possible that there may exist sub-groups of
patients (e.g., young Black men who have sex with men, cisgender women with childcare responsibilities,
people who inject drugs and are unstably housed, etc) that differ drastically in the factors that are predictive of
LTFU. Clinical phenotyping is an analytic method that can cluster patients within a heterogeneous population
into different sub-groups based on profile similarities. Before a single model is deployed with a “one-size-fits-
all” manner, it is crucial to better understand the performance of our NLP model on different clinical phenotypes
of patients with HIV. Second, it is not known how the model would perform in a prospective, real-life setting.
Finally, it is also unclear how a machine learning model would perform compared to provider intuition regarding
patients’ risk for disengagement from care. This proposal seeks to address these issues through 2 specific
aims. In Aim 1, we will determine the performance of the NLP predictive model of LTFU for different clinical
phenotypes of people with HIV. In Aim 2, we will prospectively validate the model and compare results with
care team intuition regarding risk for LTFU among people with HIV.
As we move toward ending the HIV epidemic, results from this project will provide crucial information regarding
the use of NLP and clinical phenotyping to predict loss to follow-up from HIV care an...

## Key facts

- **NIH application ID:** 10762595
- **Project number:** 1R21MH134756-01
- **Recipient organization:** UNIVERSITY OF CHICAGO
- **Principal Investigator:** Anoop Mayampurath
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $452,376
- **Award type:** 1
- **Project period:** 2023-09-05 → 2025-09-04

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10762595

## Citation

> US National Institutes of Health, RePORTER application 10762595, Clinical Phenotyping for Prediction of Retention in HIV Care (1R21MH134756-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10762595. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*