Clinical Text Automatic De-Identification to Support Large Scale Data Reuse and Sharing

NIH RePORTER · NIH · R42 · $725,232 · view on reporter.nih.gov ↗

Abstract

The adoption of Electronic Health Record (EHR) systems is growing at a fast pace in the U.S., and this growth results in very large quantities of patient clinical data becoming available in electronic format with tremendous potential but an equally large concern for patient confidentiality breaches. Secondary use of clinical data is essential to fulfill the potential for high quality healthcare, improved healthcare management, and effective clinical research. NIH expects that larger research projects share their research data in a way that protects the confidentiality of research subjects. De-identification of patient data has been proposed as a solution to both facilitate secondary use of clinical data and protect patient data confidentiality. The majority of clinical data found in the EHR is represented as narrative text clinical notes, and de-identification of clinical text is a tedious and costly manual endeavor. Automated approaches based on Natural Language Processing have been implemented and evaluated, allowing for higher accuracy and much faster de- identification than manual approaches. Clinacuity, Inc. proposes to advance a text de-identification system from a prototype to an accurate, adaptable, and robust system, integrated into the research infrastructure at our implementation and testing site (Medical University of South Carolina, Charleston, SC), and ready for commercialization efforts. To accomplish this undertaking, we will focus on the following specific aims and related objectives, while continuing to prepare the commercialization of the integrated system, with detailed market analysis, commercial roadmap development, and modern media communication: 1) Enhance the text de-identification system performance, scalability, and quality to produce an enterprise-grade solution ready for deployment; 2) Enable use of structured data for enhanced text de-identification (when structured PII is available) and for complete patient records de-identification (i.e., records combining structured and unstructured data). This aim also includes implementing “one-way” pseudo-identifier cryptographic hashing to enable securely linking already de-identified patient records; 3) Integrate the text de-identification system with a research data capture and management system. This includes implementation of the de-identification system as a secure web service, with standards-based access and integration. This de-identification system has potential commercial applications in clinical research and in healthcare settings. It will improve access to richer, more detailed, and more accurate clinical data (in clinical text) for clinical researchers. It will ease research data sharing (as expected for larger NIH-funded research projects) and help healthcare organizations protect patient data confidentiality. Significant time-savings will also be offered, with a process at least 200-1000 times faster than manual de-identification.

Key facts

NIH application ID
10098325
Project number
5R42GM116479-03
Recipient
CLINACUITY,INC.
Principal Investigator
STEPHANE MEYSTRE
Activity code
R42
Funding institute
NIH
Fiscal year
2021
Award amount
$725,232
Award type
5
Project period
2016-02-15 → 2022-07-31