# TR&D1: Data Science

> **NIH NIH P41** · UNIVERSITY OF SOUTHERN CALIFORNIA · 2021 · $254,003

## Abstract

PROJECT SUMMARY - TR&D1: DATA SCIENCE
Large-scale data aggregation has generated considerable interest within the neuroscience community, both for
its potential to increase the statistical significance of research results as well as for the reuse of data that has
already been collected. New federated approaches are needed to bring together research studies that operate
independently from one another and to manage the complex needs of data access, aggregation, harmonization,
and analysis. In Aim 1 we build upon our extensive experience in developing federated database systems and
propose a one-time application process that simplifies data access by consolidating disparate applications
across multiple research institutions. We also propose a single secure and unified pathway for downloading
binary and tabular files from different research studies that would significantly reduce the effort required to
retrieve these files. In Aim 2 we introduce a new approach for harmonizing data collected by different research
studies that incrementally applies transformations and provides immediate visual feedback with tabular updates
and interactive summaries. In Aim 3 we propose to integrate recent Docker technologies into our framework
and establish an archive for analyses that can be transferred to and executed on any Linux computer. Input and
output data will be linked to their respective analyses and used as query criteria when searching the archive. In
Aim 4 we propose a new mediator that acts as a bridge that connects all the components of our framework. This
Analysis Assembler utilizes the unified pathway of Aim 1 and automatically downloads all files needed for an
analysis. After retrieving the analysis itself from the archive in Aim 3, the Assembler proceeds to execute the
analysis on the data files. After the analysis has completed, the Assembler records the provenance of all output
data, which will be made accessible in visual queries of our federated search system. In Aim 5 we propose to
extend our quality control system to use machine learning to automatically assign “poor” and “good” quality
ratings to neuroimaging MRI data. With the goal of locating hard-to-see artifacts, we also propose to implement
interactive 3D visualizations to more accurately assess image quality. All five of our aims provide a framework
upon which neuroscience can be conducted, shared, and replicated – comprising a foundation for reproducible
science.

## Key facts

- **NIH application ID:** 10135691
- **Project number:** 5P41EB015922-24
- **Recipient organization:** UNIVERSITY OF SOUTHERN CALIFORNIA
- **Principal Investigator:** ARTHUR W TOGA
- **Activity code:** P41 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $254,003
- **Award type:** 5
- **Project period:** 1998-09-30 → 2023-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10135691

## Citation

> US National Institutes of Health, RePORTER application 10135691, TR&D1: Data Science (5P41EB015922-24). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10135691. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
