PROJECT SUMMARY Trauma is one of the leading causes of death and disability in the US and around the world. Accurate measurement is critical to improving our understanding of this disease and gauging the effectiveness of interventions. Tracking the burden of traumatic injuries relies on not only identifying deaths, but also non-fatal injuries. The widely used International Classification of Disease (ICD) diagnosis coding system, developed by the World Health Organization. does not have a mechanism for directly measuring injury severity. In order to measure in severity, ICD codes are often converted to the Abbreviated Injury Scale (AIS). Each AIS code has a measure of relative injury severity, and multiple codes can be combined to determine the overall injury severity of an individual patients. However, the currently used methods for conversion of ICD to AIS rely on one-to-one mapping between these coding systems, which has many inherent difficulties. Specifically, these one-to-one mappings have been shown to systematically underestimate overall injury severity. Recent advances in computation linguistics have solved very similar problems with the use of embedding and deep learning. We intended to apply these techniques ICD to AIS translations. The key innovation is to consider all the information available about a patient simultaneously, rather than converting each code in isolation. This objective of this R03 proposal is to develop tools that improve the accuracy of population-level injury research that uses ICD codes. We will accomplish this objective by: (1) developing a tool to predict overall injury severity for individual patients from ICD codes, and (2) developing a tool to translate ICD codes to AIS for individual patients. Modern language translation has algorithms are based on determining the location of words in an embedded space, so words with similar meaning are near to each other and the relative locations encode relationships between words. Similarly, we will transfer ICD into an embedded space, which will be used by subsequent deep learning modules produce our results. There is data for millions of trauma patients collected in in the National Trauma Data Bank (NTDB) that contains both ICD and AIS extracted by expert coders. We will use this data to train and evaluate the deep learning models that will underlie our tools. Together, these tools will meet the critical needs to improve the quality of trauma research and increase the accuracy of injury monitoring using administrative medical databases.