# Data Science Core

> **NIH NIH U19** · ALLEN INSTITUTE · 2022 · $604,844

## Abstract

Summary, Data Science Core
The Data Science Core (DSC) serves multiple purposes related to moving, storing, analyzing, and sharing data.
The data sets collected by this collaboration will be very large (PB scale) and multi-modal, including: Brain-wide,
mesoscale anatomy (Project 1-4); spatial transcriptomics (Project 2, Molecular Science Core); large-scale in vivo
electrophysiology (Project 3, 4); brain slice synaptic physiology and voltage imaging (Project 3); behavior (Project
4); large-scale simulations of neural circuits (Project 5). Implementing algorithms to extract knowledge from these
large-scale and complex data sets demands professional data science and software practices. The DSC will
configure the infrastructure to efficiently share data and analysis pipelines in the cloud. Individual research
projects also require support for implementing data analysis algorithms and related software engineering. The
DSC will implement and refine analysis algorithms so that these can be applied to data at scale. In addition to
discoveries, research papers, and reagents, data itself is a major product of our proposed research. This team
is not only committed to open science but has a history of delivering. Anatomical, molecular, neurophysiological
and behavioral data will be made available in widely-used repositories in standardized data formats. The DSC
will support the sharing of data within the team and with the scientific community at large.
The thalamus is a collection of nuclei that has been traditionally segmented using low-dimensional information,
including cytoarchitecture and single-channel immunohistochemistry. The existing segmentations (i.e.
anatomical atlases) are not sufficient to describe the rich functional architecture of the thalamus. We therefore
need to register all measurements precisely to thalamic sub-regions, agnostic to current notions of intra-thalamic
boundaries. A major task for the DSC is to register anatomical (Project 1, 2), molecular (Project 2, MSC) and
neurophysiological measurements (Projects 3, 4) at the highest possible resolution to a standardized reference
atlas (the Allen Common Coordinate Framework, CCF). Localization of all measurements within the thalamus is
an important step in discovering the architecture and functional logic of the thalamus. The DSC will implement
efficient and accurate workflows for spatial alignment of all data in the CCF.
The DSC will subscribe to the following principles: i) implement best practices with respect to FAIR (Findable,
Accurate, Interoperable, Reusable) data principles; ii) reuse and extend community data standards; iii) rely on
existing data repositories as available; and iv) reuse and build on existing open source software ecosystems.

## Key facts

- **NIH application ID:** 10294399
- **Project number:** 1U19NS123714-01
- **Recipient organization:** ALLEN INSTITUTE
- **Principal Investigator:** Karel Svoboda
- **Activity code:** U19 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $604,844
- **Award type:** 1
- **Project period:** 2022-01-15 → 2026-12-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10294399

## Citation

> US National Institutes of Health, RePORTER application 10294399, Data Science Core (1U19NS123714-01). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10294399. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
