PROJECT ABSTRACT The primary goal of this project is to leverage large, harmonized data resources comprised of a broad range of patients with heart failure (HF) by using machine learning (ML) to develop and test complex models to predict clinical outcomes and identify HF phenotypes that may be clinically important based on pathophysiology, prognosis, and treatment response. We will accomplish this through secondary analysis of 25 clinical trials, 6 large epidemiologic studies, and electronic health record data totaling ~ 130,000 patients with HF. Of these > 40,270 are derived from 21 BioLINCC datasets, 43,536 from industry-sponsored studies and 45,763 from the EHR. By utilizing a variety of studies with respect to population, design, timeframe, and data source, we envisage that our phenotypes will be a) more reflective of the spectrum of patients encountered in real world clinical practice and b) able to be identified more consistently with routinely collected clinical data. Improved characterization of outcomes according to HF phenotype may in turn facilitate personalization of HF management both in terms of therapies and treatment goals. We hypothesize that predictive and phenotyping models generated using these resources will outperform existing models across a range of data sources and clinical populations. The primary overlapping Aims of this proposal are: 1. Use data from 74,308 patients in 25 completed clinical trials to characterize survival and treatment response according to simple characteristics, predictive models, and complex phenotypes. We apply both supervised and unsupervised ML methods to this dataset in one of the largest individual patient data meta-analyses of HF clinical trial data to date. We will then compare the predictive value of these models to established models derived using conventional regression and survival analysis. 2. Validate models from Aim 1, explore novel phenotypes, and describe associated clinical characteristics prior to HF diagnosis in 9,734 patients with incident HF from observational cohorts. Using data from 6 large studies such as the Framingham Heart Study, we will validate established models and models from Aim 1. We will also identify major phenotypes not well represented in clinical trials and attempt to identify clinical risk factors that precede development of specific HF phenotypes. 3. Validate phenotype characteristics, associations, and outcomes in 45,763 patients with HF using retrospective electronic health record (EHR) data from the University of Colorado's clinical data warehouse. We will test all predictive and patient phenotype models derived in Aims 1 and 2 using these harmonized real-world data and again identify phenotypes not well-represented in other the datasets. Because of known health disparities in clinical practice, we will describe care patterns according to patient phenotype that may impact outcomes.