Analysis of Big Data Squared in Biomedical Studies

NIH RePORTER · NIH · R01 · $452,398 · view on reporter.nih.gov ↗

Abstract

Project Summary/Abstract With the rapid growth of modern technology, many large-scale biomedical studies generate massive datasets with multi-modality imaging, genetic, neurocognitive, and clinical information from increasingly large cohorts. We consider 6 publicly available datasets: the Human Connectome project (HCP) study, the UK biobank study, the Pediatric Imaging, Neurocognition, and Genetics study, the Philadelphia Neurodevelopmental Cohort, the Alzheimer's Disease Neuroimaging Initiative study, and the UNC early brain development study. Simultaneously extracting and integrating rich and diverse heterogeneous information in neuroimaging and/or genomics from these big datasets may transform our understanding of how genetic variants impact brain structure and function, cognitive function, and brain-related disease risk across the lifespan. This is critical for diagnosis, prevention, and treatment of brain-related disorders (e.g., schizophrenia and Alzheimer's). However, the development of methods for the joint analysis of high-dimensional imaging-genetic data, called big data squared, presents major theoretical and computational challenges due to complexities of imaging phenotypes such as regional volumetric measurements, cortical thickness maps, subcortical structures, structural and functional connectivity matrices, white matter tracts, and activation images. We will address three imminent challenges in the analysis of big data squared: (CH1) carrying out genome-wide association analysis for functional imaging phenotypes (e.g., white matter tracts, cortical thickness, and subcortical structures); (CH2) carrying out genome-wide association anal- ysis for high-dimensional imaging phenotypes with strong spatial structure (e.g., regional volumetric measure- ments, and structural and functional connectivity matrices); and (CH3) integrating multi-modality imaging, ge- netic, and clinical data to predict clinical outcomes (e.g., disease status or time-to-disease onset). To this end, we will develop (Aim 1) a functional genome-wide association analysis (FGWAS) framework for (CH1); (Aim 2) a net- work genome-wide association analysis (NGWAS) framework for (CH2); (Aim 3) a multi-scale prediction modeling (MSPM) framework for (CH3); and (Aim 4) verify the efficacy of the newly developed analytical tools using simula- tions and the 6 extremely valuable imaging genetic datasets. Finally, we will develop companion software for the methods to be developed in this project. The software, which will provide much needed analytic tools for the big data squared, will be disseminated to the public through http://c2s2.yale.edu/software/, https://github.com/BIG- S2, http://odin.mdacc.tmc.edu/bigs2/software.html, and http://www.nitrc.org/. Our novel methods are applicable to a variety of imaging genetic studies for neuropsychiatric disorders, major neurodegenerative diseases, sub- stance use disorders, and normal brain development. A deeper understanding of genetic...

Key facts

NIH application ID
9936251
Project number
5R01MH116527-03
Recipient
YALE UNIVERSITY
Principal Investigator
HEPING ZHANG
Activity code
R01
Funding institute
NIH
Fiscal year
2020
Award amount
$452,398
Award type
5
Project period
2018-06-05 → 2023-02-28