# Automated Indexing for Publication Types and Study Designs

> **NIH NIH R01** · UNIVERSITY OF ILLINOIS AT CHICAGO · 2023 · $321,693

## Abstract

Project Summary/Abstract
Retrieving biomedical articles from bibliographic databases requires accurate, detailed indexing of the topics
that are discussed as well as their publication types and study designs. It is difficult for indexers to keep up with
manual assignments in view of the explosion of published literature. Although NLM has recently employed
automatic machine learning methods to index articles according to the major topics discussed, there is still no
automatic means of indexing each article across all publication types and study designs. We have recently
created a working prototype tool, Multi-Tagger, which has assigned probabilistic predictive scores for all
PubMed articles for 50 different publication types and study designs (collectively, PTs). We now propose to
develop Multi-Tagger 2.0, to handle a wider variety of study designs, articles, users and use cases, and to
ensure that the data are disseminated in a form that is appropriate to each different type of user. Specifically,
we aim to:
Aim 1. Optimize methods for assigning Publication Types and study designs to both PubMed and non-
PubMed biomedical articles, preprints and manuscripts.
Aim 2. Evaluate PTs in detail, taking into account model performance, use cases and users.
Aim 3. Optimize dissemination of PT predictive scores by query interface and API.
Aim 4. Explore how to integrate Multi-Tagger with other tools for automating evidence synthesis.
The proposed studies will greatly enhance retrieval of relevant articles and preprints across multiple
databases, and will be useful for a wide range of biomedical end-users (clinicians, researchers, students and
journal editors) as well as user groups including systematic review groups, bibliographic database managers,
those studying preclinical animal models of human disease, and pharmaceutical companies developing new
drug treatments. Improving the infrastructure of the biomedical literature will thus indirectly impact on human
health.

## Key facts

- **NIH application ID:** 10715907
- **Project number:** 1R01LM014292-01
- **Recipient organization:** UNIVERSITY OF ILLINOIS AT CHICAGO
- **Principal Investigator:** NEIL R SMALHEISER
- **Activity code:** R01 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2023
- **Award amount:** $321,693
- **Award type:** 1
- **Project period:** 2023-08-02 → 2026-07-31

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10715907

## Citation

> US National Institutes of Health, RePORTER application 10715907, Automated Indexing for Publication Types and Study Designs (1R01LM014292-01). Retrieved via AI Analytics 2026-05-24 from https://api.ai-analytics.org/grant/nih/10715907. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
