# Core E: Data Sciences Core

> **NIH NIH P50** · VANDERBILT UNIVERSITY MEDICAL CENTER · 2020 · $249,104

## Abstract

The success and impact of nearly every project in IDD hinges on the proper use of statistical techniques. Thus,
Core E has a critical role in facilitating research for all IDDRC investigators, as well as for the progress of the
other IDDRC Cores and Signature Research Project. Core E performs a unique function for IDDRC
investigators as it helps them identify and use the statistical and methodological expertise and resources
available at Vanderbilt University (VU) and Vanderbilt University Medical Center (VUMC) that are appropriate
for their questions – especially for more complicated research designs (e.g., many layers of nesting) or those
with statistical limitations (e.g., small sample sizes common in research with rare populations). Further, through
generative activity with Clinical Translational and Translational Neuroscience Cores B and C, Core E provides
sophisticated and non-trivial statistical methods and models tailored to IDD-related scientific questions (e.g.,
Bayesian spatio-temporal models for neuroimaging analysis). In addition to having considerable expertise in
biostatistics, neuro-statistics, and quantitative psychology, Vanderbilt is also a national leader in developing big
data structures and mining that data to advance health and development research, including the Synthetic
Derivative (SD), a de-identified dataset of electronic health record data collected from over ~2.8 million total
records. Though such big data structures are incredible resources to Vanderbilt, and especially IDDRC
investigators with their ability to capture large samples of rare disorders, it can be challenging to put the data in
analyzable formats and select suitable statistical approaches for analysis. Core E enables IDDRC investigators
to fully capitalize on all these VU/VUMC resources through three aims: Aim 1, which provides access to
modern statistical and data science methods to answer questions of relevance to IDD, including conducting
data analyses for the Signature IDDRC Research Project; Aim 2, which enhances training in IDD research for
those engaging in data science methods, including implementing a novel internal training grant program
between Data Sciences Institute trainees and the IDDRC; and Aim 3, which supports innovation in health-
related IDD research by facilitating use of large data sets such as the SD, including providing cutting-edge
consultations and tools for working with large-scale SD IDD-curated database that IDDRC investigators can
use for generating pilot data and conducting studies. Collectively, Core E’s aims and generative work and
interactions with other IDDRC Cores not only meets the immediate needs of IDDRC investigators, but also
anticipates future ones, by allowing for novel resources, platforms, and methods to be developed. By tackling
and solving complex, multi-modal data science questions, Core E is poised to contribute substantially
over the next 5 years to accelerating scientific discovery to improve the outco...

## Key facts

- **NIH application ID:** 10085556
- **Project number:** 1P50HD103537-01
- **Recipient organization:** VANDERBILT UNIVERSITY MEDICAL CENTER
- **Principal Investigator:** Hakmook Kang
- **Activity code:** P50 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $249,104
- **Award type:** 1
- **Project period:** 2020-08-06 → 2025-05-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10085556

## Citation

> US National Institutes of Health, RePORTER application 10085556, Core E: Data Sciences Core (1P50HD103537-01). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/10085556. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
