# Improved multifactorial prediction of suicidal behavior through integration of multiple datasets

> **NIH NIH R01** · MASSACHUSETTS GENERAL HOSPITAL · 2020 · $504,970

## Abstract

Suicide is the tenth leading cause of death in the United States, accounting for more than 40,000 deaths
annually. Despite ongoing efforts to reduce the burden of suicide and suicidal behavior, rates have remained
relatively constant over the past half century. Attempts to predict suicidal behavior have relied almost
exclusively on self-reporting of suicidal thoughts and intentions. This is problematic because of well-known
reporting biases and the fact that many people at high risk are motivated to deny suicidal thoughts to avoid
hospitalization. Even though the majority of all suicide decedents have contact with a healthcare professional
in the month before their death, suicide risk is rarely detected in such cases. Efforts to identify risk factors have
also been stymied by the fact that suicide is a low-base rate event so that very large samples are needed to
test the complex combinations of factors that are likely to contribute to risk. The widespread adoption of
longitudinal electronic health records (EHRs) has created a powerful but still under-utilized resource for
detecting and predicting important health outcomes. In prior work using machine learning methods to analyze
structured EHR data, we have developed predictive models that detect up to 45% of first-episode suicidal
behavior, on average 3 years in advance. Here we aim to systematically extend and improve our EHR
prediction methods in a large healthcare system (N = 4.6 million patients) by incorporating 1) external public
record datasets (LexisNexis SocioEconomic Health Attribute data) that include environmental, socioeconomic,
and life event information; 2) natural language processing (NLP) to leverage unstructured EHR text, including
text-based scores that capture RDoC domains; 3) a novel method of deriving temporal risk envelopes to
capture the time-dependent effects of individual risk factors; and 4) clinical risk trajectories that incorporate
ordered temporal sequences of risk factors. We will systematically compare the performance of each of these
approaches to identify optimal strategies for enhancing risk surveillance and prediction in healthcare settings.
Completion of these aims would represent a crucial step towards novel, clinically deployable, and potentially
transformative tools for improving outcomes for those at risk for suicide and suicidal behavior.

## Key facts

- **NIH application ID:** 9930662
- **Project number:** 5R01MH117599-03
- **Recipient organization:** MASSACHUSETTS GENERAL HOSPITAL
- **Principal Investigator:** Ben Y Reis
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $504,970
- **Award type:** 5
- **Project period:** 2018-08-13 → 2022-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9930662

## Citation

> US National Institutes of Health, RePORTER application 9930662, Improved multifactorial prediction of suicidal behavior through integration of multiple datasets (5R01MH117599-03). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/9930662. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
