Project Summary/Abstract There is a critical need for accurate, efficient, and portable substance use disorder (SUD) identification methods that support large-scale genetic studies in uncovering biological mechanisms helpful in SUD prevention and management. The long-term goal of this work is to support genetic discovery efforts that result in beneficial interventions for prevention and treatment of SUD. The overall objective in this proposal is to develop and evaluate an SUD phenotyping method that allows investigators to fine-tune phenotypes for their specific projects. Given most large-scale SUD phenotyping for genetics studies have relied on administrative billing codes (that undercount true cases) and binary outcome labels (that induce arbitrary dichotomization), the rationale for this project is that a system which includes a variety of data sources and generates a probabilistic outcome along a continuum is needed. The SUD phenotyping framework will support the inclusion of heterogenous electronic health record data types (including administrative billing codes, medication information, and unstructured text data) and will be evaluated in multiple organizations. Pairing these SUD phenotypes with genetic data will enhance our understanding of SUD mechanisms among individuals. That foundational work could ultimately result in the development of polygenic risk scores and clinical decision support systems that we could implement prospectively in clinical care. The proposed research is significant because researchers have a pressing need for SUD phenotyping approaches that can be customized to their research focus and available data. This proposal’s innovation lies in the creation of a “self-service” approach to SUD phenotype development in which a research team can specify their own phenotype definitions. The software will have a graphical user interface that makes the highest-yield rules/heuristics the easiest to use and can therefore be used by investigators with basic scientific programming knowledge. Through the activities outlined above, this innovation will directly accelerate genetic studies of SUD while simultaneously developing a precision phenotyping framework that can be applied to other disease domains.