Hidden Markov methodology for machine learning applied to identifying physiological states of shock in the intensive care unit via biomedical and unstructured text data

NIH RePORTER · NIH · R56 · $494,527 · view on reporter.nih.gov ↗

Abstract

PROJECT SUMMARY There is a void of well-developed machine learning tools for the clinical hospital setting for patient monitoring and diagnosis. This absence is particularly relevant for an intensive care unit (ICU), where structured and unstructured data are continuously recorded on numerous aspects of the health status of each patient. The methods that have been developed are predominantly exclusive to the research literature, and they are focused on models/algorithms trained on a single data type such as continuous response vitals data or natural language data. Likewise, the metrics to evaluate these machine learning tools are often focused on a single metric such as out-of-sample prediction error. For machine learning tools to really become effective they need to be built from models that are able to incorporate varying data types, simultaneously, for making inferences on the health state of a patient, and they need to be evaluated on a variety of metrics. Some of these metrics must be precise (e.g., out-of-sample prediction error and false negative/positive rate), but other qualitative metrics must also be considered such as clinical utility/feasibility, scalability, and the practicality of the user interface to a clinician. By analogy to hypothesis testing problems, there is an important difference between statistical signiﬁcance and practical signiﬁcance. The proposed research is aimed at developing statistical methodology to address these key aspects, and to engineer machine learning tools to be applied to hospital patient monitoring and diagnosis. In particular, the focus is on rapid identiﬁcation of critically ill patients at risk for bleeding and physiological de- terioration such as shock. For the purpose of this research, shock is divided into four categories: hypovolemic shock, distributive shock, neurogenic shock, and cardiogenic shock. Historical ICU patient encounter data is gathered with numerous examples of patients exhibiting each of these health states, as well as a baseline en- counters exhibiting no shock. However, the timeline and detection for clinician diagnosis of shock is not precise and is not without error. Accordingly, training data labels are only ever partially available, and the developed machine learning methodology will account for the semi-supervised nature of the problem. To make inference on the shock-related health state of ICU patients the machine learning methodology will incorporate a variety of response data types. These emitted responses include continuously monitored vitals data, laboratory results, functional wave form data on blood pressure, unstructured text data on clinician and procedural notes, and typi- cal cross-sectional data on medical history and demographic information. The data integration challenges from building all of these responses into a single parsimonious model will be a strong contribution of the proposed research. Additionally, the proposed research plan spans from the methodolo...

Key facts

NIH application ID: 10098894
Project number: 1R56HL155373-01
Recipient: NORTH CAROLINA STATE UNIVERSITY RALEIGH
Principal Investigator: Jonathan Paul Williams
Activity code: R56
Funding institute: NIH
Fiscal year: 2021
Award amount: $494,527
Award type: 1
Project period: 2021-09-20 → 2023-08-31