# Data Science Core

> **NIH NIH U19** · STANFORD UNIVERSITY · 2023 · $499,843

## Abstract

Data Science Core (DSC)
Leads: Krishna Shenoy PhD and Chris Roat PhD (with Surya Ganguli PhD)
 Project Summary
Given the large volumes of optical, electrical, genetic and behavioral data that will be generated, stored and
computationally analyzed, it is essential to establish a comprehensive and yet streamlined DSC. There are four
major data challenges that the DSC will address. (1) Data size. Each experimental lab will generate very large,
and rapidly increasing, datasets. We must contend with storing, pre-processing (e.g., spike sorting) and
processing (e.g., single-trial analyses) these large and growing datasets. (2) Metadata. Collaborations between
groups are often hampered by not fully capturing – in a searchable database and linked to the bulk data – all
animal and experiment conditions, or so-called metadata. We will build in capabilities and requirements to
electronically capture full metadata. (3) Data format. Collaborations are also often hampered by the effort
required to understand each lab’s dataset format. Data format often depends on whether a given measurement
system was custom built or relies on a commercial system. We will capture this information as part of the
metadata for historical data relevant to this U19, and moving forward we will adopt the increasingly-popular
NeuroData Without Borders (NWB) data format. Finally, (4) Across animals and labs. Performing large-
scale analyses across many animals and labs is often truly onerous. This is because all three of the challenges
listed above combine, causing one to shy away from anything other than essential analyses (e.g., pooling results
across just a few mice in one specific condition). We will both build our own data pipelines to automatically
query our metadata database and, subsequently, retrieve the indicated experimental data as well as adopt the
increasingly-popular DataJoint pipeline.
 Our DSC will be led by Prof. Shenoy, Dr. Roat (with considerable industrial-scale data handling
experience, and now at Stanford) and Prof. Ganguli (RP3 lead). Two full-time software engineers (TBD) will
implement the DSC architecture, including bulk data server, relational meta-database, data standards and
data pipeline. The software engineers will work closely with the rest of the team to help assure good
communication, and to help migrate analysis code and documentation into professional software standards for
dissemination. This will enable storage, retrieval and analysis of data in an efficient and modular way, which
enables rapid replacement of any piece of the data analysis pipeline as is essential for a creative environment
that also promotes rapid feedback of emerging ideas to subsequent experiments. We believe in Open Science,
including open source code (e.g., github) and data formats. We will share data with the broader community,
including with other U19 consortia. Thus our DSC is critical to the success of our proposed research, and
serves as the central hub of our U19 resear...

## Key facts

- **NIH application ID:** 10687136
- **Project number:** 5U19NS118284-03
- **Recipient organization:** STANFORD UNIVERSITY
- **Principal Investigator:** Krishna V Shenoy
- **Activity code:** U19 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $499,843
- **Award type:** 5
- **Project period:** 2021-09-17 → 2026-08-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10687136

## Citation

> US National Institutes of Health, RePORTER application 10687136, Data Science Core (5U19NS118284-03). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10687136. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
