PROJECT SUMMARY/ABSTRACT Randomized clinical trials (RCTs) are the gold standard for assessing treatment safety and efficacy and are the primary evidence supporting FDA's regulatory decisions. However, RCTs have a number of limitations, including the lack of generalizability of study findings and insufficient follow-up to assess long-term outcomes. With the growing availability of disease modifying treatments (DMTs), novel approaches are needed to monitor long-term safety and efficacy of agents used in chronic diseases. Electronic health record (EHR) data present the opportunity to capture longitudinal treatment response in heterogeneous patient populations and real-world settings and can be used to generate real-world evidence (RWE) to augment RCT data for these drugs. However, availability of RWE for DMTs has been limited by the lack of computable information on disease progression measures, which are the clinical outcomes monitored by physicians directing therapy. This information is typically captured only in unstructured text during clinical visits and may also not be consistently documented at every encounter, resulting in incomplete data even with labor-intensive manual abstraction. Further, it is critical that RWE resources for robust post-market assessments of DMTs ensure reproducibility of findings across healthcare systems. In this proposal, we address this unmet need by developing methods to generate reproducible and generalizable RWE on unstructured efficacy and adverse event (AE) endpoints used in the evaluation of therapies for rheumatoid arthritis and multiple sclerosis. We will create scalable disease progression endpoints from EHR data by linking information in EHRs to registry data and building algorithms for ordinal disease activity scores using features derived from scoring guidelines. In Aim 1, we integrate disease activity and progression data from registries to generate scalable RWE on disease progression endpoints leveraging structured and free-text EHR data. In Aim 2, we develop strategies to correct for noise in medication prescriptions for DMTs in RWE studies. Aim 3 combines EHR data from multiple healthcare systems through federated learning to ensure generalizability of RWE. We intend for the methods to build new capabilities for use of RWE in FDA's regulatory decisions on drug effectiveness, providing an efficient, scalable, and robust approach to using real-world clinical data to support approval of new drug indications and conduct of postmarket studies for DMTs.