# Data Management and Analysis Core

> **NIH NIH P42** · NORTHEASTERN UNIVERSITY · 2024 · $520,589

## Abstract

PROJECT SUMMARY
The Data Management and Analysis Core (DMAC) plays a critical role in achieving the Center objectives by
serving as a central repository of Center data and providing for cross-indexing and linkage of the diverse data
sets produced by the environmental and biomedical projects and cores in the Center. The current PROTECT
Database System holds nearly 7 million cleaned and secure data entities. The DMAC is responsible for the
reliability of the data, including cleaning, replication and backup, as well as the protection of the data, including
de-identification of human subjects, and secure and authenticated access. The DMAC allows data generated
by the projects to be cross-indexed by all projects based on a global PROTECT Data Dictionary that includes
common index fields (subject ID, GIS coordinates) to foster sharing and integration. DMAC also provides a rich
set of modeling and statistical analysis toolsets and expertise to support Project-level objectives. The
combined collection of data and tools allows PROTECT to work seamlessly across project domains and
effectively ties environmental factors to human subject outcomes.
To support Center goals and ensure its long-term impact, we will continue to build upon the rich infrastructure
developed in the first eight years of this Center. We will continue to partner with EarthSoft, a major provider of
environmental data management software, to provide enhanced database capabilities appropriate for all
Center projects and cores. We will continue to support cleaning, indexing, documenting, and security of all
Center-based data through a secure, online, database system, as well as provide a common suite of advanced
statistical/analysis tools integrated into the backend of the database system. As part of the renewal, we will
expand our analytics support by adding Jennifer Dy, Justin Manjourides and Bhramar Mukherjee to the DMAC,
supporting machine learning and statistical analysis of mixtures that include phthalates, chlorinated volatile
organic compounds (CVOCs), polycyclic aromatic hydrocarbons (PAHs), metals and pesticides across all
projects. We will expand our use of mapping with a Geographic Information System (GIS), integrating analytics
and mapping into a common framework, making our data easily understood by a wide range of communities.
To achieve our Center-level aims that tie environmental factors to health-related outcomes, the DMAC will
continue to develop a common suite of analysis and visualization tools based on GIS, SAS, R and Python,
providing analysis tailored for each project, while also leveraging state-of-the-art software and frameworks. The
specific statistical tools developed for mixtures analysis will use RStudio’s data cleaning, visualization and
archiving functions, and will be disseminated through GitHub. The DMAC already has developed a suite of
Data Mining tools that provide regression and clustering analysis in an integrated online visualization
framework. Finall...

## Key facts

- **NIH application ID:** 10767250
- **Project number:** 5P42ES017198-14
- **Recipient organization:** NORTHEASTERN UNIVERSITY
- **Principal Investigator:** DAVID R KAELI
- **Activity code:** P42 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $520,589
- **Award type:** 5
- **Project period:** 2010-04-12 → 2026-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10767250

## Citation

> US National Institutes of Health, RePORTER application 10767250, Data Management and Analysis Core (5P42ES017198-14). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10767250. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
