Project Summary Carbapenem-resistant Enterobacterales (CRE) cause more than 13,000 infections in U.S. inpatients annually, with mortality rates that can exceed 50%. In hospitals, asymptomatic colonization is emerging as a critical target for CRE infection prevention: epidemiologically, early identification of colonized patients can reduce intra-hospital spread. And clinically, colonized inpatients face significantly higher — but potentially modifiable — risks of CRE infection. Due to diagnostic limitations, however, widescale CRE colonization screening remains impractical for most U.S. hospitals. Prediction models offer alternative strategies for identifying patients at high risk of colonization and of subsequent infection. However, models face two methodological obstacles, limiting their wider utility: (1) strong colonization risk factors are “locked” in electronic health record (EHR) free-text that is unavailable for model-building unless records are reviewed manually; and (2) due to low CRE prevalence, most statistical models can only evaluate limited numbers of candidate variables. We propose to exploit state-of-the-art machine learning and natural language processing (NLP) techniques to improve identification of CRE-colonized and infected patients. We will apply these methods to EHRs from >21,000 patients screened for CRE at The University of Maryland and The Johns Hopkins hospitals. In Aim 1, we will build and validate NLP algorithms on admission histories to detect pre-admission exposures that are strong colonization risk factors but poorly captured in structured EHR data fields. NLP is a cutting-edge computational technique for “unlocking” these types of unstructured data. We will also use text-mining approaches to identify potential new or local CRE risk factors. In Aims 2 and 3, we will build and validate models from NLP-derived variables and other EHR data to predict colonization at admission (Aim 2) and progression to infection (Aim 3) using machine learning algorithms that excel on high-dimensional data. Taken together, this work will help hospitals identify patients at high risk of CRE colonization and infection early, when deleterious patient outcomes are still preventable. Because NLP is automated, successful models could be exported to other hospitals and integrated into EHRs; all algorithms resulting from this work will be made freely available. This will be the first study to deploy NLP for bacterial carriage screening and the largest U.S. study to follow CRE-colonized inpatients for infection. As a PhD-trained epidemiologist with a CRE and machine learning background, and who previously practiced FDA law, I am drawn to interdisciplinary, rigorous approaches and policies for reducing the toll of antibiotic resistance in hospitalized patients. In the short-term, Career Development Award support would allow me to build experience using sophisticated computational approaches for EHR-based information extraction and predictive mode...