# Using novel data sources across genetic, biological, and social domains to refine genome-wide investigations of substance use disorders

> **NIH NIH K01** · YALE UNIVERSITY · 2024 · $181,040

## Abstract

Abstract
This career development award proposal aims to create an immersive training experience in the context of
studying the genetic etiology of substance use disorders (SUDs). The completion of the research and training
aims will provide the applicant with a unique skillset at the intersection of psychiatric genetics, SUD
epidemiology and health disparities, SUD psychopharmacology, clinical informatics, and bioinformatics.
Genome-wide association studies (GWAS) have been valuable for genetic discovery and dissecting the
biology of SUDs, but improvements to study design are needed. First, SUD GWAS typically account only for
diagnostic status for the focal SUD of interest; however, substance co-use and SUD co-occurrence are
common and may impact interpretation of findings. Second, SUD GWAS often rely on diagnostic codes that
are included in electronic health records (EHRs) but miss other substance use not captured by a SUD
diagnosis. EHR-based substance toxicology data can provide superior resolution of substance use and assess
if someone has been exposed to a specific substance. Third, substance exposure information is important – a
person must initiate use of a substance for a SUD to develop. To assess a person’s genetic liability for a SUD
requires knowing if that individual has been exposed to that substance. Defining substance-exposed controls
solidifies that cases and controls are accurately designated and allows for the isolation of the genetic effects
specific to SUD risk. Fourth, GWAS have been largely performed in European-ancestry samples. Efforts have
underscored the need to extend GWAS to diverse ancestries, but insufficient attention has been given to racial
disparities in SUD GWAS. The inclusion of genetically diverse populations combined with examining social
determinants of health are important for addressing health disparities in SUD GWAS. This proposal seeks to
address these limitations using the Million Veteran Program (MVP) sample – a large and diverse biobank that
includes genetic, environmental, and medical information including EHRs that contain SUD diagnoses and
drug toxicology data. EHR data will be used to identify diagnosed SUDs and co-occurring SUDs for each MVP
participant. Drug toxicology data will be used to assess for additional substance use. Combining EHR SUD
diagnostic codes and toxicology results will provide a comprehensive summary of substance use for each MVP
participant. This will benefit SUD GWAS in terms of: (1) modeling patterns of SUD co-occurrence and
substance co-use; (2) providing substance use specificity that often goes undocumented by EHR codes alone;
and (3) the ability to identify substance-exposed controls that have used a substance but do not have a SUD
diagnosis. Reducing health disparities in SUD GWAS will be addressed through the inclusion of all available
genetic ancestry groups and examining disparities in rates of toxicology test administration across self-reported
racial and sociodemo...

## Key facts

- **NIH application ID:** 10985085
- **Project number:** 1K01DA058807-01A1
- **Recipient organization:** YALE UNIVERSITY
- **Principal Investigator:** Joseph D. Deak
- **Activity code:** K01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $181,040
- **Award type:** 1
- **Project period:** 2024-08-01 → 2029-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10985085

## Citation

> US National Institutes of Health, RePORTER application 10985085, Using novel data sources across genetic, biological, and social domains to refine genome-wide investigations of substance use disorders (1K01DA058807-01A1). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10985085. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
