# Data Science Resource Core

> **NIH NIH U19** · COLUMBIA UNIVERSITY HEALTH SCIENCES · 2022 · $563,980

## Abstract

SUMMARY
The major theme of this proposal is a tightly closed loop of experiment, theory, and data analysis.
Sophisticated, scalable data science methods are a critical component of this loop.
The Data Science Core serves two primary purposes. First, we will apply and refine sophisticated data analysis
algorithms directly related to the project’s scientific goals. This project will generate massive streams of data
from multiple recording and simulation modalities: whole-cell electrophysiology and anatomy, large-scale
calcium imaging, spatiotemporally-complex optogenetic perturbations, RNA sequencing images, in addition to
massive simulations of networks of spiking neurons. A correspondingly major effort is needed to manage this
data, to distill it into new scientific knowledge, and to design new experiments, theoretical analyses, and
simulations to close the theory-experiment-analysis loop. This will entail the application and iterative refinement
of algorithms for preprocessing the data (e.g., taking calcium imaging video and extracting demixed and
denoised neural activity from each cell visible in the field of view); aligning, registering, and performing
statistical inferences on data across multiple modalities (e.g, calcium imaging, optogenetic stimulation, and
seqFISH); functionally characterizing the stimulus preferences and correlation structure of the activity in the
observed cells; and developing closed-loop optimal experimental design methods to obtain richer, more
informative data.
Second, this Core will build a collaborative infrastructure allowing the multiple laboratories in this project to act
as one: sharing data and analysis tools, and closely integrating theorists and experimentalists. This
infrastructure will: be completely open source; build on current efforts to standardize neuroscience data; be
modular and extensible to allow for rapid iterative improvement of each stage of the algorithmic pipeline;
enforce automatic archiving and recording of algorithmic metadata describing versioning and parameter
choices for easy searchability and reproducibility; and allow for straightforward benchmarking. As we develop
these practices and tools for data and analysis pipeline sharing, we will make them immediately available to
the community. Thus we will provide a model platform for vastly improving reproducibility, keeping analysis
pipelines up to date as improved methods are developed, and most importantly saving researchers from re-
developing and re-implementing analysis software and data storage/sharing solutions. We aim to make it easy
for groups of labs anywhere in the world to unite and crack large-scale neural circuits. This will transform the
way neuroscience is done.

## Key facts

- **NIH application ID:** 10438689
- **Project number:** 5U19NS107613-05
- **Recipient organization:** COLUMBIA UNIVERSITY HEALTH SCIENCES
- **Principal Investigator:** Liam M Paninski
- **Activity code:** U19 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2022
- **Award amount:** $563,980
- **Award type:** 5
- **Project period:** 2018-09-15 → 2024-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10438689

## Citation

> US National Institutes of Health, RePORTER application 10438689, Data Science Resource Core (5U19NS107613-05). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/10438689. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*