AI-assisted Annotation of Minimally Invasive Surgical Video for Building Information-Rich Datasets and Data-Efficient Learning Systems

NIH RePORTER · NIH · R21 · $393,525 · view on reporter.nih.gov ↗

Abstract

ABSTRACT Minimally invasive surgery (MIS) has gained popularity in recent years by offering increased manipulation dexter- ity in hard-to-reach areas through small incisions, thereby reducing patient pain, complication rate, and recovery time. However, MIS often involves a limited ﬁeld of view, complex hand-eye coordination, and limited depth and tactile information. Without sufﬁcient timely and context-aware feedback, these procedures are difﬁcult to learn and, even then, still challenging and risky. Aiming to understand the activities and progress of the surgery to provide valuable feedback, artiﬁcial intelligence, and machine learning technologies have been proposed. How- ever, large-scale and high-quality data has been a consistent bottleneck to new developments and the inclusion of new surgical disciplines and procedures. In modern surgery, a dramatic amount of video data is captured at a high resolution and frame rate, so it is impractical to label the data densely, and until now, existing pub- licly available surgical datasets are still dismally small in size and representativeness compared to non-surgical datasets, attributing to the necessity of costly hand-labeling by clinical experts. This projoect proposes to develop autonomous algorithms for rapid, cost-effective surgical video frame annotation. Speciﬁcally, the approach ad- dresses methods to automatically compress lengthy surgical video data to only key informative time segments based on surgical context (Aim 1) and ﬁnd within these segments the most informative video keyframes of speciﬁc anatomy/tissues to be used eventually for downstream tasks such as annotation and/or image-guidance (Aim 2). The expected outcome will be algorithms automatically identifying informative video segments and selecting the most information-rich keyframes to extract for hand-labeled annotations. These keyframes may then be used to learn neural network-based semantic segmentation models that can be trained to perform pixel-level recognition of anatomy in the scene. A colorectal robotic surgery dataset of low anterior resection will be built as part of the effort, where there is signiﬁcant clinical importance in recognizing anatomy to avoid hitting nerves that could lead to erectile dysfunction. We will compare the performance of the segmentation algorithms trained with our proposed sparse annotations against algorithms trained with denser (but more costly) annotations. If success- ful, the project will generate a solution to address the lack of high-quality data. The obtained preliminary data with algorithms trained on it will provide rich semantic information to potentially enhance surgery safety, improve treatment outcomes, and reduce healthcare expenses in the future.

Key facts

NIH application ID: 10953417
Project number: 1R21EB036284-01
Recipient: UNIVERSITY OF CALIFORNIA, SAN DIEGO
Principal Investigator: Shanglei Liu
Activity code: R21
Funding institute: NIH
Fiscal year: 2024
Award amount: $393,525
Award type: 1
Project period: 2024-09-20 → 2026-09-19