NMRbox information management: CONNJUR and BMRB integration

NIH RePORTER · NIH · P41 · $217,331 · view on reporter.nih.gov ↗

Abstract

TRD2 Summary Biomolecular NMR data computation is complex, requiring both semi-automated and manual processing steps. The NMRbox platform provides easy access to the myriad software tools used during an individual investigation. The efforts of TRD1 ensure that all of this software functions within the same operating system environment. The goals of TRD2 are to provide information management solutions to promote software interoperability across the NMRbox platform (TRD1) and with the analytics of TRD3; to foster reproducibility through the active curation of scientific datasets and workflows; to facilitate the integration of experimental and derived data with information from public databases to generate new knowledge, and to help aggregate this diverse panoply of experimental data and metadata to create and facilitate richer depositions to the appropriate public data repositories. A key component to enriching repository depositions is the collection of provenance metadata which reports on the history or lineage of how computations were orchestrated and intermediate and final results obtained. Knowing the pathway of how results were obtained is a critical component to reproducing and building upon scientific claims. The aims of this project period are fourfold. The CONNJUR data model and CONNJUR software integration environments (integral components of NMRbox) will be extended to capture more metadata to support the tracking of provenance (or data lineage), to foster reproducibility, to promote software interoperability both within and without NMRbox, and to support a richer metadata set for depositions to the public data repositories. In addition, the workflow management system will be extended to support high-throughput computational workflows by connecting to powerful job management systems such as HTCondor and DAGMan. The data and process logging capabilities of NMRbox will be used for automatic harvesting of whatever metadata can be gleaned by these mechanisms. Finally, more direct access to BMRB resources will be provided through direct machine-to- machine services.

Key facts

NIH application ID
10147733
Project number
5P41GM111135-07
Recipient
UNIVERSITY OF CONNECTICUT SCH OF MED/DNT
Principal Investigator
JEFFREY C HOCH
Activity code
P41
Funding institute
NIH
Fiscal year
2021
Award amount
$217,331
Award type
5
Project period
2015-09-01 → 2025-05-31