# Computational Studies of Virus-host Interactions Using Metagenomics Data and Applications

> **NIH NIH R01** · UNIVERSITY OF SOUTHERN CALIFORNIA · 2020 · $430,738

## Abstract

Computational Studies of Virus-host Interactions Using Metagenomics Data and
Applications
Summary: Viruses are ubiquitous in almost every ecological environment including the human
body, water, soil, etc. They play important roles in the normal function of human microbiome.
Many viruses have been shown to be associated with human diseases. However, our
understanding of the roles of viruses in ecological communities is very limited. Recent
technological and computational advances make it possible to have a deep understanding of
the roles of viruses in public health and the environment. Metagenomics studies from various
environments including the human microbiome projects (HMP), global ocean, and the earth
microbiome projects have generated large amounts of short read data. Viruses are present in
most of these metagenomic data sets and their hosts are unknown. In this proposal, the
investigators will develop computational approaches for the identification of viral sequences
from metagenomic data sets and for the study of virus-host interactions. For the identification of
viral sequences from metagenomics samples, novel statistical measures using word patterns
will first be developed. Second, a unified naïve Bayesian integrative approach by combining
information from word patterns, gene directionality, and gene annotation will be studied. Third,
the identified viral sequences from metagenomes will be further assembled to construct
complete viral genomes using a novel binning approach to be developed by the investigators.
Finally, the remaining reads will be assigned to the corresponding bins. For the study of virus-
host interactions, computational methods to estimate the reliability of virus-host interactions
from high-throughput experiments will first be developed. Then machine learning approaches
will be developed to predict viruses infecting certain hosts. Finally, a network logistic regression
approach will be developed to predict virus-host interactions. These computational approaches
for the identification of viral sequences and for predicting virus-host interactions will be applied
to a public liver cirrhosis and a unique metagenomics data set to understand how metagenomes
change with health status, identify viruses and virus-host interactions associated with disease
status and accurately predict disease status using bacteria, viruses and virus-host interactions.
The developed computational methods will also be used to analyze metageomic data from
various locations based on the TARA ocean data and a unique time series data to understand
how environmental factors affect virus abundance and virus-host interactions. Some of the
predictions will be experimentally validated. Software derived from the proposal will be
developed and freely distributed to the scientific community.

## Key facts

- **NIH application ID:** 9899262
- **Project number:** 5R01GM120624-04
- **Recipient organization:** UNIVERSITY OF SOUTHERN CALIFORNIA
- **Principal Investigator:** Nathan Ahlgren
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $430,738
- **Award type:** 5
- **Project period:** 2017-04-15 → 2022-03-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9899262

## Citation

> US National Institutes of Health, RePORTER application 9899262, Computational Studies of Virus-host Interactions Using Metagenomics Data and Applications (5R01GM120624-04). Retrieved via AI Analytics 2026-05-22 from https://api.ai-analytics.org/grant/nih/9899262. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
