# Creating Scalable, Reliable, Sustainable Infrastructure for FAIR Data

> **NIH NIH U01** · UNIVERSITY OF SOUTHERN CALIFORNIA · 2020 · $175,000

## Abstract

PROJECT SUMMARY
The goal of the FaceBase III Hub is to create a FAIR data repository to serve the entire
community of dental and craniofacial researchers by sharing diverse data related to craniofacial
development and dysmorphia. To meet this goal, FaceBase is built on Deriva, an open-source
data management system designed with FAIR data principles in mind. This platform has
allowed FaceBase to evolve with changing requirements for data on new experimental
methodologies and instruments, additional model organisms, cell characterization, integration of
computational pipelines, and visualization interfaces.
Currently, we implement Deriva on private and public clouds using a “data center-in-the-cloud”
format; i.e., treating the cloud like a traditional remote computer, to run virtual machine images
and conventional data storage. However, cloud platforms such as Amazon Web Services offer a
wide range of cloud-native services beyond virtual machines which if fully leveraged would
drastically improve important aspects of Deriva that would directly benefit FaceBase. Hence,
we propose to enhance Deriva for cloud-based operations to address three key aspects of
Deriva in support of FaceBase and its other NIH communities: scalability, reliability, and
sustainability. Specifically, we plan to use AWS native services to improve its scalability (Aim 1),
decouple Deriva services to run in containerized execution environments to ensure its reliability
(Aim 2), and develop cost management dashboards to monitor and predict costs of operating in
the cloud to achieve sustainability (Aim 3). The AWS native services are fully managed and
highly-scalable, and offload much of the overhead of system operations and maintenance.
Improvements in Deriva scalability, reliability, and sustainability achieved by these Aims will
allow the FaceBase Hub to provide the growing community of data contributors and users with
better service. In addition, many other user communities such as GUDMAP, (Re)Building a
Kidney, the Kidney Precision Medicine Project (NIDDK) and the Common Fund Data
Environment (OD) rely on Deriva, and all of the improvements resulting from these Aims would
yield a direct and immediate benefit to thousands of additional users.

## Key facts

- **NIH application ID:** 10166440
- **Project number:** 3U01DE028729-02S2
- **Recipient organization:** UNIVERSITY OF SOUTHERN CALIFORNIA
- **Principal Investigator:** Yang Chai
- **Activity code:** U01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2020
- **Award amount:** $175,000
- **Award type:** 3
- **Project period:** 2019-08-01 → 2021-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10166440

## Citation

> US National Institutes of Health, RePORTER application 10166440, Creating Scalable, Reliable, Sustainable Infrastructure for FAIR Data (3U01DE028729-02S2). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10166440. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
