# A Modeling Framework for Multi-View Data, with Applications to the Pioneer 100 Study and Protein Interaction Networks

> **NIH NIH R01** · UNIVERSITY OF WASHINGTON · 2020 · $323,659

## Abstract

New advances in biomedical research have made it possible to collect multiple data “views” — for example,
genetic, metabolomic, and clinical data — for a single patient. Such multi-view data promises to offer deeper
insights into a patient's health and disease than would be possible if just one data view were available. However, in
order to achieve this promise, new statistical methods are needed.
 This proposal involves developing statistical methods for the analysis of multi-view data. These methods can
be used to answer the following fundamental question: do the data views contain redundant information about the
observations, or does each data view contain a different set of information? The answer to this question will provide
insight into the data views, as well as insight into the observations. If two data views contain redundant information
about the observations, then those two data views are related to each other. Furthermore, if each data view tells the
same “story” about the observations, then we can be quite conﬁdent that the story is true.
 The investigators will develop a uniﬁed framework for modeling multi-view data, which will then be applied in
a number of settings. In Aim 1, this framework will be applied to multi-view multivariate data (e.g. a single set
of patients, with both clinical and genetic measurements), in order to determine whether a single clustering can
adequately describe the patients across all data views, or whether the patients cluster separately in each data
view. In Aim 2, the framework will be applied to multi-view network data (e.g. a single set of proteins, with both
binary and co-complex interactions measured), in order to determine whether the nodes belong to a single set of
communities across the data views, or a separate set of communities in each data view. In Aim 3, the framework
will be applied to multi-view multivariate data in order to determine whether the observations can be embedded in
a single latent space across all data views, or whether they belong to a separate latent space in each data view.
In Aims 1–3, the methods developed will be applied to the Pioneer 100 study, and to the protein interactome. In
Aim 4(a), the availability of multiple data views will be used in order to develop a method for tuning parameter
selection in unsupervised learning. In Aim 4(b), protein communities that were identiﬁed in Aim 2 will be validated
experimentally. High-quality open source software will be developed in Aim 5.
 The methods developed in this proposal will be used to determine whether the ﬁndings from multiple data views
are the same or different. The application of these methods to multi-view data sets, including the Pioneer 100 study
and the protein interactome, will improve our understanding of human health and disease, as well as fundamental
biology.

## Key facts

- **NIH application ID:** 9962426
- **Project number:** 5R01GM123993-04
- **Recipient organization:** UNIVERSITY OF WASHINGTON
- **Principal Investigator:** Jacob Bien
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $323,659
- **Award type:** 5
- **Project period:** 2017-08-01 → 2023-06-30

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9962426

## Citation

> US National Institutes of Health, RePORTER application 9962426, A Modeling Framework for Multi-View Data, with Applications to the Pioneer 100 Study and Protein Interaction Networks (5R01GM123993-04). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/9962426. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*