Empirical Validation of GA4GH's Data Use Ontology with NIH Datasets' Data Use Limitations

NIH RePORTER · NIH · U24 · $105,595 · view on reporter.nih.gov ↗

Abstract

Project Summary The goal of this project is to validate the Global Alliance for Genomics and Health (GA4GH) Data Use Ontology (DUO) using >700 NIH datasets to inform its use in scaling data sharing. The DUO was developed to provide a standard vocabulary for describing permitted secondary uses of genomic data. By standardizing data use language globally, the DUO expedites data sharing compliant with participants’ consent and ethical and legal regulations. As the largest public funder of biomedical research in the world NIH’s database of Genotypes and Phenotypes (dbGaP) holds over 7,582 datasets on over 330,000 variables available via dbGaP as of March 2021; making the NIH-funded datasets housed in dbGaP an ideal test case for the validation of the DUO. In this project we will work with NIH data access committees (DACs) to align their datasets’ existing data use limitations (DULs) with data use terms from the DUO. As a result, we will identify which DULs and DUO terms have parity and where gaps exist between the two, and discuss if and how to address the gaps with DACs. We aim to summarize these key findings and recommendations in a formal publication, following on the initial DUO validation of 120 datasets at the Broad Institute, and via a communication to GA4GH Data Use & Research Identities work stream leadership (of which Mr Lawson is a member) for their consideration in evolving the DUO. Thereafter, we will advance discussions with the NIH DACs to ensure all NIH DULs can be mapped to machine-readable term(s) in the DUO. Once completed through either adjustments to the DUO, the NIH DULs, or both, we will tag all datasets with the appropriate DUO terms in the Broad Institute’s Data Use Oversight System (DUOS) Dataset Catalog. The DUOS Dataset Catalog will be upgraded to query datasets’ by their permitted data use (e.g., DULs) so that researchers are able filter out datasets they likely will not be approved to access. The DUOS’ decision-support algorithm will match the DUO terms on access requests to the DUO terms on the dbGaP datasets. DUOS will use this to automatically triage requests and propose a decision to grant access or not that can easily be reviewed by DACs. With the DUL inputs accurately calibrated and their functionality enabled in DUOS, the NIH DACs and Office of Science Policy would be well-positioned to codify their DAR review policy into DUOS’ decision support algorithm. This would enable automated decisions on DARs for typical requests, and significantly reduce the burden on DACs and DAR turnaround time to researchers.

Key facts

NIH application ID: 10367297
Project number: 3U24HG011025-01A1S1
Recipient: BROAD INSTITUTE, INC.
Principal Investigator: HEIDI L REHM
Activity code: U24
Funding institute: NIH
Fiscal year: 2021
Award amount: $105,595
Award type: 3
Project period: 2021-08-03 → 2026-01-31