# Making data from the center for GWAS in outbred rats FAIR and AI/ML ready

> **NIH NIH P50** · UNIVERSITY OF CALIFORNIA, SAN DIEGO · 2021 · $316,000

## Abstract

Summary (NOT-OD-21-094 Administrative Supplement to P50DA037844)
 In this supplement application, we are seeking funds to improve the AI/ML-readiness of data
generated by the Center for GWAS in outbred rats. Since the center’s formation in 2014, we have collected
extensive data on more than 8,000 genetically unique heterogeneous stock (HS) rats and have secured
funding to grow that number to 16,000 by 2025. Data types include genotypes at millions of single
nucleotide polymorphisms (SNPs), complex behavioral and physiological phenotypes, RNASeq, ATACSeq,
single cell RNASeq, single cell ATACSeq, microbiome, and metabolomic data. While the center is focused
on traits relevant to substance abuse, these datasets are much more broadly applicable. They include other
behavioral traits relevant to all fields of neuroscience and physiological traits relevant to numerous organ
systems and diseases. These data have been carefully curated, including numerous human and automated
quality control steps, and are organized as data types available for each unique individual. However, there
is no public facing description of the data, and no effort has been put into making them AI/ML-ready.
 In this proposal, we will improve this situation by bringing together a team with expertise in 1) this
specific dataset, 2) best practices for information sharing, and 3) AI/ML for genetic applications. We will
begin by bringing the group together to identify the most important and addressable shortcomings. We will
then begin to address these goals, meeting frequently to monitor progress and overcome unanticipated
challenges. Finally, as the work is completed, our extant network of AI/ML collaborators will perform simple
AI/ML exercises to confirm that the improvements are successful. This will be an iterative process; meaning
that we may revise specific action items over the course of the project in an effort to maximize impact. We
anticipate that improvements will include establishing a website, and making all of our data findable. We will
use protocols.io to document each research protocol, will assign RRIDs to all individuals, and will use best
practices to make all data FAIR and AI/ML ready. This supplement will provide the impetus and funding to
bring together an outstanding team to make sure that NIH’s investment in this unique dataset can be used
for cutting edge AI/ML approaches. This project is within the scope of the parent award but does not
duplicate any work already supported by the parent grant.

## Key facts

- **NIH application ID:** 10409990
- **Project number:** 3P50DA037844-09S1
- **Recipient organization:** UNIVERSITY OF CALIFORNIA, SAN DIEGO
- **Principal Investigator:** Abraham A. Palmer
- **Activity code:** P50 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $316,000
- **Award type:** 3
- **Project period:** 2014-06-15 → 2024-04-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10409990

## Citation

> US National Institutes of Health, RePORTER application 10409990, Making data from the center for GWAS in outbred rats FAIR and AI/ML ready (3P50DA037844-09S1). Retrieved via AI Analytics 2026-05-26 from https://api.ai-analytics.org/grant/nih/10409990. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
