NIDCR Data Bank

NIH RePORTER · NIH · N02 · $492,433 · view on reporter.nih.gov ↗

Abstract

The NIDCR Data Bank project is a generic data management platform that is intended to provide the ability to move data at scale from on-site data production infrastructure into long-term STRIDES based archival cloud storage. The process is designed to be simple for data owners to operate independently of technology support staff, and will ensure capture of strong meta-data, creation of digital object identifiers, and provide advanced primary, secondary data, and tertiary data analytical opportunities during ingestion. The platform is also designed around a robust cost-recovery model, which will put data producers and stewards in a strong position to understand the data they retain, and make informed value judgements about it's retention as well as ensure data retention compliance. The platform will be extensible and able to leverage a variety of standard vocabularies and data sharing integrations via API. This fully opened sourced platform, is well aligned to advance ODSS goals. The NIDCR Data Bank was conceived as a data sustainability solution from it's inception. The platform is, developed on the Microsoft Azure cloud platform, leverages Microsoft resource management best practices and capabilities which will enable full cost recovery for the operation of the platform for an adopting organization from the organization's data producer/steward. Meta data collected will enable strong data governance, thus allowing data retention informed decisions to be made based on policy and data valuation. Other design considerations will enable long-term sustainability of data stored in an instance of the Data Bank. For example, Azure data storage provides for automated tiering to less expensive "cold" storage options. Design considerations for this type of infrastructure will accommodate technical challenges such differences in data downloading that various tiers of storage may require. NIDCR will employ and maintain the core open-sourced code base.

Key facts

NIH application ID
10706894
Project number
316201200155W-P00009-759202000001-1
Recipient
LCG SYSTEMS, LLC
Principal Investigator
DAWN HAAG-HATTERER
Activity code
N02
Funding institute
NIH
Fiscal year
2022
Award amount
$492,433
Award type
Project period
2019-12-17 → 2022-12-16