ABSTRACT The transition from paper charts to electronic health records (EHRs), advances in computing power and storage capacity, and the availability and accessibility of sophisticated machine learning algorithms have revolutionized the ability for researchers to tap into Big Data and make use of it to answer all sorts of important clinical questions. However, maximizing the utility of all of this rich clinical data from EHRs and clinical registries is predicated on the ability for researchers to accurately identify which patients have specific diseases; to accurately classify conditions based on their disease severity; to ascertain which patients are improving, stable, or deteriorating; and to appropriately identify and quantify clinically relevant outcomes. Currently, nearly all researchers who work with Big Data in ophthalmology rely exclusively on administrative billing codes to identify common ocular diseases and outcomes of interest. Yet, research has shown that sole reliance on billing codes is fraught with limitations and does not take full advantage of the plethora of useful information routinely captured in structured and free-text EHR data. In this proposal we develop, rigorously test, and validate an innovative approach to permit researchers to more accurately identify and classify patients with common sight-threatening ocular diseases and capture transitions from less to more severe disease states and key outcomes of interest. Based on preliminary studies we performed, we believe our approach to enhanced ocular phenotype identification is substantially more accurate than exclusive reliance on billing codes. In Aim 1, we use this approach to EHR data to identify and categorize patients with 3 of the most common causes of irreversible vision loss—glaucoma, diabetic retinopathy, and macular degeneration. In Aim 2, we extend enhanced phenotype identification by trying to identify novel forms of these 3 conditions; we will use cluster analysis to identify groups of clinical features associated with these 3 ocular diseases that co-segregate together. We will also test whether some of these clusters are associated with better or worse clinical outcomes. In Aim 3, we apply our approach to identify key ocular outcomes in EHR data such as disease stability and progression from less to more advanced stages for each of the 3 ocular diseases of interest. By fulfilling the aims of this proposal, we will permit researchers throughout the country and the world to more accurately identify important ocular diseases and outcomes in EHR and clinical registry datasets. This will serve as a key building block to permit researchers to incorporate Big Data into machine learning and artificial intelligence applications, genotype-phenotype association studies, patient recruitment for clinical trials, and many other clinical and translational research projects.