Multifile probabilistic record linkage for drug overdose surveillance and public health action

NIH RePORTER · NIH · R21 · $220,455 · view on reporter.nih.gov ↗

Abstract

Record linkage refers to the process of integrating data by identifying unique individuals within and across data sources. In administrative databases, it is common to have a limited amount of the individuals' partial identifiers, such as names or dates of birth, which together with typographical errors and missing data, makes the record linkage task difficult and prone to errors. Probabilistic record linkage approaches have been shown to have superior performance when compared with ruled-based deterministic techniques, as probabilistic approaches adapt better to different and increased levels of error in the datafiles. Existing probabilistic approaches are nevertheless subject to different limitations. In practice, it is common to encounter data integration scenarios where multiple data sources need to be simultaneously merged and deduplicated using imperfect information such as names, dates or addresses. These scenarios go beyond the specifications for which commonly used record linkage and deduplication methodologies have been developed. We therefore propose to extend the currently-available best-performing record linkage methodologies to simultaneously integrate multiple datafiles and detect duplicated records within them. We will develop this methodology, with an associated software and graphical user interface, in partnership with Public Health – Seattle & King County to ensure that these are responsive to real world needs and challenges. We will also conduct a pilot study implementing the techniques on King County administrative data systems used for overdose surveillance and evaluation of overdose prevention programs.

Key facts

NIH application ID
10200740
Project number
5R21DA051756-02
Recipient
UNIVERSITY OF WASHINGTON
Principal Investigator
Julia Elizabeth Hood
Activity code
R21
Funding institute
NIH
Fiscal year
2021
Award amount
$220,455
Award type
5
Project period
2020-07-01 → 2023-06-30