PREMIERE: A PREdictive Model Index and Exchange REpository

NIH RePORTER · NIH · R01 · $673,491 · view on reporter.nih.gov ↗

Abstract

The confluence of new machine learning (ML) data-driven approaches; increased computational power; and access to the wealth of electronic health records (EHRs) and other emergent types of data (e.g., omics, imaging, mHealth) are accelerating the development of biomedical predictive models. Such models range from traditional statistical approaches (e.g., regression) through to more advanced deep learning techniques (e.g., convolutional neural networks, CNNs), and span different tasks (e.g., biomarker/pathway discovery, diagnostic, prognostic). Two issues have become evident: 1) as there are no comprehensive standards to support the dissemination of these models, scientific reproducibility is problematic, given challenges in interpretation and implementation; and 2) as new models are put forth, methods to assess differences in performance, as well as insights into external validity (i.e., transportability), are necessary. Tools moving beyond the sharing of data and model “executables” are needed, capturing the (meta)data necessary to fully reproduce a model and its evaluation. The objective of this R01 is the development of an informatics standard supporting the requisite information for scientific reproducibility for statistical and ML-based biomedical predictive models; from this foundation, we then develop new computational approaches to compare models' performance. We begin by extending the current Predictive Model Markup Language (PMML) standard to fully characterize biomedical datasets and harmonize variable definitions; to elucidate the algorithms involved in model creation (e.g., data preprocessing, parameter estimation); and to explain the validation methodology. Importantly, models in this PMML format will become findable, accessible, interoperable, and reusable (i.e., following FAIR principles). We then propose novel meth- ods to compare and contrast predictive models, assessing transportability across datasets. While metrics exist for comparing models (e.g., c-statistics, calibration), often the required case-level information is not available to calculate these measures. We thus introduce an approach to simulate cases based on a model's reported da- taset statistics, enabling such calculations. Different levels of transportability are then assigned to the metrics, determining the extent to which a selected model is applicable to a given population/cohort (i.e., helping answer the question, can I use this published model with my own data?). We tie these efforts together in our proposed framework, the PREdictive Model Index & Exchange REpository (PREMIERE). We will develop an online portal and repository for model sharing around PREMIERE, and our efforts will include fostering a community of users to guide its development through workshops, model-thons, and other activities. To demonstrate these efforts, we will bootstrap PREMIERE with predictive models from a targeted domain (risk assessment in imaging-based lung cancer screening). Our ...

Key facts

NIH application ID: 10668938
Project number: 5R01EB027650-04
Recipient: UNIVERSITY OF CALIFORNIA LOS ANGELES
Principal Investigator: ALEX BUI
Activity code: R01
Funding institute: NIH
Fiscal year: 2023
Award amount: $673,491
Award type: 5
Project period: 2019-09-15 → 2024-05-31