Automated Indexing for Publication Types and Study Designs

NIH RePORTER · NIH · R01 · $321,693 · view on reporter.nih.gov ↗

Abstract

Project Summary/Abstract Retrieving biomedical articles from bibliographic databases requires accurate, detailed indexing of the topics that are discussed as well as their publication types and study designs. It is difficult for indexers to keep up with manual assignments in view of the explosion of published literature. Although NLM has recently employed automatic machine learning methods to index articles according to the major topics discussed, there is still no automatic means of indexing each article across all publication types and study designs. We have recently created a working prototype tool, Multi-Tagger, which has assigned probabilistic predictive scores for all PubMed articles for 50 different publication types and study designs (collectively, PTs). We now propose to develop Multi-Tagger 2.0, to handle a wider variety of study designs, articles, users and use cases, and to ensure that the data are disseminated in a form that is appropriate to each different type of user. Specifically, we aim to: Aim 1. Optimize methods for assigning Publication Types and study designs to both PubMed and non- PubMed biomedical articles, preprints and manuscripts. Aim 2. Evaluate PTs in detail, taking into account model performance, use cases and users. Aim 3. Optimize dissemination of PT predictive scores by query interface and API. Aim 4. Explore how to integrate Multi-Tagger with other tools for automating evidence synthesis. The proposed studies will greatly enhance retrieval of relevant articles and preprints across multiple databases, and will be useful for a wide range of biomedical end-users (clinicians, researchers, students and journal editors) as well as user groups including systematic review groups, bibliographic database managers, those studying preclinical animal models of human disease, and pharmaceutical companies developing new drug treatments. Improving the infrastructure of the biomedical literature will thus indirectly impact on human health.

Key facts

NIH application ID: 10715907
Project number: 1R01LM014292-01
Recipient: UNIVERSITY OF ILLINOIS AT CHICAGO
Principal Investigator: NEIL R SMALHEISER
Activity code: R01
Funding institute: NIH
Fiscal year: 2023
Award amount: $321,693
Award type: 1
Project period: 2023-08-02 → 2026-07-31