# Multifile probabilistic record linkage for drug overdose surveillance and public health action

> **NIH NIH R21** · UNIVERSITY OF WASHINGTON · 2021 · $220,455

## Abstract

Record linkage refers to the process of integrating data by identifying unique individuals within and across data
sources. In administrative databases, it is common to have a limited amount of the individuals' partial
identifiers, such as names or dates of birth, which together with typographical errors and missing data, makes
the record linkage task difficult and prone to errors. Probabilistic record linkage approaches have been shown
to have superior performance when compared with ruled-based deterministic techniques, as probabilistic
approaches adapt better to different and increased levels of error in the datafiles. Existing probabilistic
approaches are nevertheless subject to different limitations. In practice, it is common to encounter data
integration scenarios where multiple data sources need to be simultaneously merged and deduplicated using
imperfect information such as names, dates or addresses. These scenarios go beyond the specifications for
which commonly used record linkage and deduplication methodologies have been developed. We therefore
propose to extend the currently-available best-performing record linkage methodologies to simultaneously
integrate multiple datafiles and detect duplicated records within them. We will develop this methodology, with
an associated software and graphical user interface, in partnership with Public Health – Seattle & King County
to ensure that these are responsive to real world needs and challenges. We will also conduct a pilot study
implementing the techniques on King County administrative data systems used for overdose surveillance and
evaluation of overdose prevention programs.

## Key facts

- **NIH application ID:** 10200740
- **Project number:** 5R21DA051756-02
- **Recipient organization:** UNIVERSITY OF WASHINGTON
- **Principal Investigator:** Julia Elizabeth Hood
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $220,455
- **Award type:** 5
- **Project period:** 2020-07-01 → 2023-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10200740

## Citation

> US National Institutes of Health, RePORTER application 10200740, Multifile probabilistic record linkage for drug overdose surveillance and public health action (5R21DA051756-02). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10200740. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
