Project Summary/Abstract: Core 2, Data Science The Data Science Core will facilitate and standardize data collection and analysis for all research projects within this U19 program. In particular, we will develop processes and systems for collecting, organizing, and analyzing behavioral, imaging, electrophysiology, and neural manipulation data. To benefit the broader neuroscience community, we will adopt shared data and metadata formats and make our pipelines publicly available, with DataJoint as the common framework for scientific data pipelines and the Neurodata without Borders format to share large raw data. This platform will facilitate collaborative analysis of datasets by multiple researchers within the project, and make our analyses reproducible and extensible by others. We will make our code and data public in easy-to-find, open-access repositories, such as the BRAIN Initiative’s Distributed Archives for Neurophysiology Data Integration and Github. Our use of these common data standards will make the data interoperable and reusable, thus ensuring that our data publications adhere to FAIR guidelines. The core’s first aim will be to provide standardized computational pipelines for neurophysiological and behavioral data. We already have standardized data pipelines for collection of virtual-reality behavioral data and preprocessing of mesoscope imaging and Neuropixels electrophysiology recordings. We now propose to extend this effort to all data generated by the collaboration, via three new initiatives. First, we will construct a shared platform, accessed by modular, user-friendly web apps, to support virtual-reality and operant-conditioning tasks. Second, we will extend our preprocessing pipeline for electrophysiology and calcium imaging data to support several state-of-the-art segmentation algorithms. Automation of preprocessing, data transfer between systems, and standardization of manual curation steps will make analyses faster and easier, enabling more effective and reproducible processing of neurophysiological data. Third, we will develop infrastructure to support perturbations during behavior, including optogenetic, pharmacological, and physical manipulations. The core’s second aim will be to document the system, train users, and disseminate our computational tools and workflows. This effort will alleviate burdens on researchers, accelerate research by promoting standard software tools, increase adoption of standardized pipelines, and facilitate reuse of our data by others. To facilitate training and use of these pipelines, we will develop integrated web-based tools that allow world-wide access and control of local data processing. The modular nature of these tools will make them useful to and popular with the broader neuroscience community. We will provide continuous, in-person training to all researchers and technicians, including yearly tutorials with external consultants. Together, these methods for automating and standardizing da...