# Intelligent deployment of containerized bioinformatics workflows on the cloud

> **NIH NIH R01** · UNIVERSITY OF WASHINGTON · 2020 · $339,098

## Abstract

PROJECT SUMMARY
Cloud computing has emerged as a promising solution to address the challenges of big data. Public cloud
vendors provide computing as-a-utility enabling users to pay only for the resources that are actually used. In
this application, we will develop methods and tools to enable biomedical researchers to optimize the costs of
cloud computing when analyzing biomedical big data. Infrastructure-as-a-Service (IaaS) cloud provides
computing as a utility, on-demand, to end users, enabling cloud resources to be rapidly provisioned and
scaled to meet computational and performance requirements. In addition, dynamic intelligent allocation of
cloud computing resources has great potential to both improve performance and reduce hosting costs.
Unfortunately, determining the most cost-effective and efficient ways to deploy modules on the cloud is non-
trivial, due to a plethora of cloud vendors, each providing different types of virtual machines with different
capabilities, performance trade-offs, and pricing structures.
In addition, modern bioinformatics workflows consist of multiple modules, applications and libraries, each with
their own set of software dependencies. Software containers package binary executables and scripts into
modules with their software dependencies. With containers that compartmentalize software dependencies,
modules implemented as containers can be mixed and matched to create workflows that give identical results
on any platform. The high degree of reproducibility and flexibility of software containers makes them ideal
instruments for disseminating complex bioinformatics workflows.
Our overarching goal is to deliver the latest technological advances in containers and cloud computing to a
typical biomedical researcher with limited resources who works with big data. Specifically, we will develop a
user-friendly drag-and-drop interface to enable biomedical researchers to build and edit containerized
workflows. Most importantly, users can choose to deploy and scale selected modules in the workflow on
cloud computing platforms in a transparent, yet guided fashion, to optimize cost and performance. Our
aim is to provide a federated approach that leverages resources from multiple cloud vendors.
We have assembled a team of interdisciplinary scientists with expertise in bioinformatics, cloud and distributed
computing, and machine learning. As part of this application, we will work closely with end users who
routinely generate and analyze RNA-seq data. We will illustrate how our containerized, cloud-enabled
methods and tools will benefit bioinformatics analyses.

## Key facts

- **NIH application ID:** 9856493
- **Project number:** 5R01GM126019-03
- **Recipient organization:** UNIVERSITY OF WASHINGTON
- **Principal Investigator:** Ka Yee Yeung-Rhee
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $339,098
- **Award type:** 5
- **Project period:** 2018-02-01 → 2023-01-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/9856493

## Citation

> US National Institutes of Health, RePORTER application 9856493, Intelligent deployment of containerized bioinformatics workflows on the cloud (5R01GM126019-03). Retrieved via AI Analytics 2026-05-23 from https://api.ai-analytics.org/grant/nih/9856493. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
