PROJECT SUMMARY ANCA-associated vasculitis (AAV) is a small vessel vasculitis associated with disease- and treatment-related complications that contribute to reduced quality of life and excess mortality compared to the general population. In the context of improving rates of flare and mortality with contemporary treatments, increasing attention is shifting to complications (e.g., renal failure, infection, cardiovascular disease) as clinically-relevant and patient-oriented outcomes. However, our understanding of how best to address and prevent complications is limited because they are typically studied in isolation from a “single disease framework.” We do not understand how complications tend to co-occur in individuals in complication clusters. Moreover, with several available treatment options for AAV, comparative effectiveness studies using real-world experience data and relevant outcomes like complication clusters are needed to guide treatment decisions in a manner that personalizes care, improves quality of life, and reduces mortality. However, we do not have the methods to accurately and efficiently assemble an AAV cohort using state-of-the-art algorithms that leverage heterogeneous claims and electronic health record (EHR) data. The aims of this proposal are to (1) apply advanced clinical informatics methods (i.e., machine learning and natural language processing) to identify AAV cases in big data to assemble a large cohort and (2) determine complication clusters in an AAV cohort by applying latent transition analysis. To achieve these aims, we will leverage methodologic expertise developed through collaborations established during the PI’s K23 and use a novel data source that includes EHR data linked to Medicare and Medicaid claims. The PI’s team has previously demonstrated that unstructured (i.e., free-text) EHR data can be used to study topics mentioned in clinical notes of AAV patients and that keywords in these notes can help identify AAV patients but neither machine learning nor sophisticated natural language processing have been previously used to identify AAV cases. In addition, our prior work has examined AAV complications in isolation (e.g., renal disease, cardiovascular disease) but here we seek to identify phenotypes of complications (complication clusters) that tend to co-occur in patients, how patients transition between clusters over time, and what factors predict a person’s membership in a complication cluster. The major goal of this proposal is to build further preliminary data in preparation for an R01 application over the next 24 months. The planned R01 will focus on comparative effectiveness studies in AAV using cohorts assembled in big data and clinically-relevant, patient-oriented outcomes, like complication clusters. The results of these studies can then be used as inputs in simulation models built during my K23 to guide optimal patient-oriented treatment decisions. Ultimately, the goal of this research program is to i...