Statistical and Machine Learning Methods to Address Biomedical Challenges for Integrating Multi-view Data

NIH RePORTER · NIH · R35 · $351,514 · view on reporter.nih.gov ↗

Abstract

Project Summary Many diseases are complex, heterogeneous, conditions that affect multiple organs in the body and depend on the interplay between several factors that include genetic, cellular, molecular, and environmental factors. It is therefore not surprising that the pathogenesis of many complex diseases remain elusive, and therapeutic targets are lacking. The traditional approach that focus on a small number of molecules (e.g., genes or metabolites) or a single type of data (e.g., clinical or genetic) cannot address this complexity and heterogeneity. Integrative or systems biology approaches and network analysis can be used to leverage the strengths of data from multiple sources (e.g., genomics, metabolomics, epidemiology, clinical data) to achieve new insights into the pathobiology of complex diseases. Recent technological advances have enabled the production of vast amounts of diverse but related data with rich information that offer remarkable opportunities to understand biological processes involved in complex diseases and to transform medicine, yet at the same time present significant analytical challenges including how to effectively synthesize information from the tens of thousands of data points to identify important biomarkers with potential to serve as therapeutic targets. To alleviate this, we will develop and apply a suite of novel, robust, and powerful statistical and machine learning methods for the integration and interpretation of cross-sectional and longitudinal data from multiple sources. These models will also be used to define subpopulations of patients who have different prognoses or require different therapeutic approaches based on data from different sources. Further, we will make use of recent advances in network theory to model the complex multilateral relationships in molecular data from multiple sources. The proposed methods will be applied to several publicly available datasets and cohorts to ensure that we can generalize our work to other datasets and cohorts and thus increase the long-term impact of our research. The proposed research will also contribute valuable statistical and machine learning algorithms that will be broadly applicable to data from multiple sources and multiple cohorts and will be made available to the public free of charge.

Key facts

NIH application ID
10274846
Project number
1R35GM142695-01
Recipient
UNIVERSITY OF MINNESOTA
Principal Investigator
Sandra E Safo
Activity code
R35
Funding institute
NIH
Fiscal year
2021
Award amount
$351,514
Award type
1
Project period
2021-09-23 → 2026-06-30