Abstract This career development award proposal aims to create an immersive training experience in the context of studying the genetic etiology of substance use disorders (SUDs). The completion of the research and training aims will provide the applicant with a unique skillset at the intersection of psychiatric genetics, SUD epidemiology and health disparities, SUD psychopharmacology, clinical informatics, and bioinformatics. Genome-wide association studies (GWAS) have been valuable for genetic discovery and dissecting the biology of SUDs, but improvements to study design are needed. First, SUD GWAS typically account only for diagnostic status for the focal SUD of interest; however, substance co-use and SUD co-occurrence are common and may impact interpretation of findings. Second, SUD GWAS often rely on diagnostic codes that are included in electronic health records (EHRs) but miss other substance use not captured by a SUD diagnosis. EHR-based substance toxicology data can provide superior resolution of substance use and assess if someone has been exposed to a specific substance. Third, substance exposure information is important – a person must initiate use of a substance for a SUD to develop. To assess a person’s genetic liability for a SUD requires knowing if that individual has been exposed to that substance. Defining substance-exposed controls solidifies that cases and controls are accurately designated and allows for the isolation of the genetic effects specific to SUD risk. Fourth, GWAS have been largely performed in European-ancestry samples. Efforts have underscored the need to extend GWAS to diverse ancestries, but insufficient attention has been given to racial disparities in SUD GWAS. The inclusion of genetically diverse populations combined with examining social determinants of health are important for addressing health disparities in SUD GWAS. This proposal seeks to address these limitations using the Million Veteran Program (MVP) sample – a large and diverse biobank that includes genetic, environmental, and medical information including EHRs that contain SUD diagnoses and drug toxicology data. EHR data will be used to identify diagnosed SUDs and co-occurring SUDs for each MVP participant. Drug toxicology data will be used to assess for additional substance use. Combining EHR SUD diagnostic codes and toxicology results will provide a comprehensive summary of substance use for each MVP participant. This will benefit SUD GWAS in terms of: (1) modeling patterns of SUD co-occurrence and substance co-use; (2) providing substance use specificity that often goes undocumented by EHR codes alone; and (3) the ability to identify substance-exposed controls that have used a substance but do not have a SUD diagnosis. Reducing health disparities in SUD GWAS will be addressed through the inclusion of all available genetic ancestry groups and examining disparities in rates of toxicology test administration across self-reported racial and sociodemo...