Project Summary Electronic health records contain a wealth of information about patient health status that can be mined for multiple purposes, including clinical research and improved decision-making at the point of care. This information can be represented as structured variables, unstructured text, and images, among other data types. In this work, we develop new models for representing the unstructured text that take advantage of powerful neural models called pre-trained transformers. We propose to make these models usable for much longer texts by adding hierarchical layers to operate over summaries of smaller chunks of text, and shrinking the size of the encoder that operates on smaller chunks. First, we develop a smaller encoder for sentence and paragraph-sized texts, by using a technique called extreme distillation that trains smaller models from the output of larger models. We also propose to pre-train hierarchical models for text, by taking advantage of smaller encoders like that from the first aim. We take advantage of both public and private datasets and experiment with different pre-training tasks and architectures. Our final aim proposes to combine representations learned from text with those from the more mature areas of structured data and images. We design experiments that answer the question of how best to merge these different information sources, and apply them to important clinical classification use cases that are likely to require multiple information sources for accurate performance. Specifically, we address the clinical tasks of predicting injury severity in emergency departments, and predicting diagnosis and prognosis of patients in intensive care units.