# AI-assisted Annotation of Minimally Invasive Surgical Video for Building Information-Rich Datasets and Data-Efficient Learning Systems

> **NIH NIH R21** · UNIVERSITY OF CALIFORNIA, SAN DIEGO · 2024 · $393,525

## Abstract

ABSTRACT
Minimally invasive surgery (MIS) has gained popularity in recent years by offering increased manipulation dexter-
ity in hard-to-reach areas through small incisions, thereby reducing patient pain, complication rate, and recovery
time. However, MIS often involves a limited ﬁeld of view, complex hand-eye coordination, and limited depth and
tactile information. Without sufﬁcient timely and context-aware feedback, these procedures are difﬁcult to learn
and, even then, still challenging and risky. Aiming to understand the activities and progress of the surgery to
provide valuable feedback, artiﬁcial intelligence, and machine learning technologies have been proposed. How-
ever, large-scale and high-quality data has been a consistent bottleneck to new developments and the inclusion
of new surgical disciplines and procedures. In modern surgery, a dramatic amount of video data is captured
at a high resolution and frame rate, so it is impractical to label the data densely, and until now, existing pub-
licly available surgical datasets are still dismally small in size and representativeness compared to non-surgical
datasets, attributing to the necessity of costly hand-labeling by clinical experts. This projoect proposes to develop
autonomous algorithms for rapid, cost-effective surgical video frame annotation. Speciﬁcally, the approach ad-
dresses methods to automatically compress lengthy surgical video data to only key informative time segments
based on surgical context (Aim 1) and ﬁnd within these segments the most informative video keyframes of speciﬁc
anatomy/tissues to be used eventually for downstream tasks such as annotation and/or image-guidance (Aim 2).
The expected outcome will be algorithms automatically identifying informative video segments and selecting the
most information-rich keyframes to extract for hand-labeled annotations. These keyframes may then be used to
learn neural network-based semantic segmentation models that can be trained to perform pixel-level recognition
of anatomy in the scene. A colorectal robotic surgery dataset of low anterior resection will be built as part of
the effort, where there is signiﬁcant clinical importance in recognizing anatomy to avoid hitting nerves that could
lead to erectile dysfunction. We will compare the performance of the segmentation algorithms trained with our
proposed sparse annotations against algorithms trained with denser (but more costly) annotations. If success-
ful, the project will generate a solution to address the lack of high-quality data. The obtained preliminary data
with algorithms trained on it will provide rich semantic information to potentially enhance surgery safety, improve
treatment outcomes, and reduce healthcare expenses in the future.

## Key facts

- **NIH application ID:** 10953417
- **Project number:** 1R21EB036284-01
- **Recipient organization:** UNIVERSITY OF CALIFORNIA, SAN DIEGO
- **Principal Investigator:** Shanglei Liu
- **Activity code:** R21 (R01, R21, SBIR, etc.)
- **Funding institute:** NIH
- **Fiscal year:** 2024
- **Award amount:** $393,525
- **Award type:** 1
- **Project period:** 2024-09-20 → 2026-09-19

## Primary source

NIH RePORTER: https://reporter.nih.gov/project-details/10953417

## Citation

> US National Institutes of Health, RePORTER application 10953417, AI-assisted Annotation of Minimally Invasive Surgical Video for Building Information-Rich Datasets and Data-Efficient Learning Systems (1R21EB036284-01). Retrieved via AI Analytics 2026-05-25 from https://api.ai-analytics.org/grant/nih/10953417. Licensed CC0.

---

*[NIH grants dataset](/datasets/nih-grants) · CC0 1.0*
