# Analysis of Big Data Squared in Biomedical Studies

> **NIH NIH R01** · YALE UNIVERSITY · 2021 · $435,164

## Abstract

Project Summary/Abstract
With the rapid growth of modern technology, many large-scale biomedical studies generate massive datasets
with multi-modality imaging, genetic, neurocognitive, and clinical information from increasingly large cohorts.
We consider 6 publicly available datasets: the Human Connectome project (HCP) study, the UK biobank study,
the Pediatric Imaging, Neurocognition, and Genetics study, the Philadelphia Neurodevelopmental Cohort, the
Alzheimer's Disease Neuroimaging Initiative study, and the UNC early brain development study. Simultaneously
extracting and integrating rich and diverse heterogeneous information in neuroimaging and/or genomics from
these big datasets may transform our understanding of how genetic variants impact brain structure and function,
cognitive function, and brain-related disease risk across the lifespan. This is critical for diagnosis, prevention,
and treatment of brain-related disorders (e.g., schizophrenia and Alzheimer's). However, the development of
methods for the joint analysis of high-dimensional imaging-genetic data, called big data squared, presents major
theoretical and computational challenges due to complexities of imaging phenotypes such as regional volumetric
measurements, cortical thickness maps, subcortical structures, structural and functional connectivity matrices,
white matter tracts, and activation images. We will address three imminent challenges in the analysis of big data
squared: (CH1) carrying out genome-wide association analysis for functional imaging phenotypes (e.g., white
matter tracts, cortical thickness, and subcortical structures); (CH2) carrying out genome-wide association anal-
ysis for high-dimensional imaging phenotypes with strong spatial structure (e.g., regional volumetric measure-
ments, and structural and functional connectivity matrices); and (CH3) integrating multi-modality imaging, ge-
netic, and clinical data to predict clinical outcomes (e.g., disease status or time-to-disease onset). To this end, we
will develop (Aim 1) a functional genome-wide association analysis (FGWAS) framework for (CH1); (Aim 2) a net-
work genome-wide association analysis (NGWAS) framework for (CH2); (Aim 3) a multi-scale prediction modeling
(MSPM) framework for (CH3); and (Aim 4) verify the efﬁcacy of the newly developed analytical tools using simula-
tions and the 6 extremely valuable imaging genetic datasets. Finally, we will develop companion software for the
methods to be developed in this project. The software, which will provide much needed analytic tools for the big
data squared, will be disseminated to the public through http://c2s2.yale.edu/software/, https://github.com/BIG-
S2, http://odin.mdacc.tmc.edu/bigs2/software.html, and http://www.nitrc.org/. Our novel methods are applicable
to a variety of imaging genetic studies for neuropsychiatric disorders, major neurodegenerative diseases, sub-
stance use disorders, and normal brain development. A deeper understanding of genetic...

## Key facts

- **NIH application ID:** 10103853
- **Project number:** 5R01MH116527-04
- **Recipient organization:** YALE UNIVERSITY
- **Principal Investigator:** HEPING ZHANG
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2021
- **Award amount:** $435,164
- **Award type:** 5
- **Project period:** 2018-06-05 → 2023-02-28

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10103853

## Citation

> US National Institutes of Health, RePORTER application 10103853, Analysis of Big Data Squared in Biomedical Studies (5R01MH116527-04). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10103853. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*