# Enabling Comparative Effectiveness Research in Silent Brain Infarction Through Natural Language Processing and Big Data

> **NIH NIH R01** · TUFTS MEDICAL CENTER · 2020 · $520,509

## Abstract

It is a common clinical occurrence that neuroimaging scans obtained in the course of routine clinical care
discover a prior brain infarction in patients with no history of stroke or transient ischemic attack. Indeed,
epidemiologic studies indicate that silent brain infarctions (SBI) are far more common than strokes; MRI-
defined SBI can be detected in ~20% of the healthy elderly. In these studies based on screened patients, the
findings have been shown to be associated with subtle, typically unrecognized, deficits in physical and
cognitive function. These imaging findings are also strong, independent risk factors for future stroke and
dementia. Despite the very high prevalence of SBI in screened populations, and their serious consequences,
little is known about the significance or the appropriate management of SBI when discovered incidentally in
routine care. While there is strong evidence that antiplatelet therapy and statin therapy are effective in
preventing recurrent stroke in patients with prior stroke, the degree to which these results apply to patients with
SBI is unclear. Because patients with SBI define a population that falls in between primary and secondary
stroke prevention, the approach to these patients is marked by uncertainty and practice variation, making it an
ideal condition for observational comparative effectiveness research. Nonetheless, there are serious
challenges for the study of SBI. As patients have no overt symptoms, recruitment into a trial can be
problematic. Comparative effectiveness research on SBI, using routinely collected data and leveraging the
variation in care of these patients, is impeded by the fact that there are no ICD codes for SBI, and it is
generally not included in structured fields of electronic health records (EHR) as it is typically considered an
incidental finding. In order to establish the comparative effectiveness of treatment strategies for SBI in a large,
heterogeneous population, we propose to develop Natural Language Processing (NLP) algorithms to identify
individuals with SBI through the automated review of neuroradiology reports. We have performed preliminary
work to demonstrate that such an approach is feasible. We will then apply state-of-the-science observational
comparative effectiveness methods in a massive database with linked EHR and claims data to examine the
effectiveness of statins and antiplatelet agents in preventing future stroke and dementia. Thus, our aims are:
Aim 1: We will develop NLP algorithms that can accurately identify cases of SBI and white matter disease
(WMD) in two different health systems. Aim 2: We will port, refine, and validate the NLP algorithm in a large
heterogeneous database including more than 600 hospitals and 6500 clinics and identify a large cohort of
patients with routinely-discovered SBI and WMD. Aim 3: We will characterize the cohort with respect to age-
specific prevalence of SBI, management patterns, and the risk of future stroke and dementia ass...

## Key facts

- **NIH application ID:** 9931326
- **Project number:** 5R01NS102233-04
- **Recipient organization:** TUFTS MEDICAL CENTER
- **Principal Investigator:** DAVID M KENT
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $520,509
- **Award type:** 5
- **Project period:** 2017-06-01 → 2023-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9931326

## Citation

> US National Institutes of Health, RePORTER application 9931326, Enabling Comparative Effectiveness Research in Silent Brain Infarction Through Natural Language Processing and Big Data (5R01NS102233-04). Retrieved via AI Analytics 2026-05-27 from https://api.ai-analytics.org/grant/nih/9931326. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
