ABSTRACT Minimally invasive surgery (MIS) has gained popularity in recent years by offering increased manipulation dexter- ity in hard-to-reach areas through small incisions, thereby reducing patient pain, complication rate, and recovery time. However, MIS often involves a limited field of view, complex hand-eye coordination, and limited depth and tactile information. Without sufficient timely and context-aware feedback, these procedures are difficult to learn and, even then, still challenging and risky. Aiming to understand the activities and progress of the surgery to provide valuable feedback, artificial intelligence, and machine learning technologies have been proposed. How- ever, large-scale and high-quality data has been a consistent bottleneck to new developments and the inclusion of new surgical disciplines and procedures. In modern surgery, a dramatic amount of video data is captured at a high resolution and frame rate, so it is impractical to label the data densely, and until now, existing pub- licly available surgical datasets are still dismally small in size and representativeness compared to non-surgical datasets, attributing to the necessity of costly hand-labeling by clinical experts. This projoect proposes to develop autonomous algorithms for rapid, cost-effective surgical video frame annotation. Specifically, the approach ad- dresses methods to automatically compress lengthy surgical video data to only key informative time segments based on surgical context (Aim 1) and find within these segments the most informative video keyframes of specific anatomy/tissues to be used eventually for downstream tasks such as annotation and/or image-guidance (Aim 2). The expected outcome will be algorithms automatically identifying informative video segments and selecting the most information-rich keyframes to extract for hand-labeled annotations. These keyframes may then be used to learn neural network-based semantic segmentation models that can be trained to perform pixel-level recognition of anatomy in the scene. A colorectal robotic surgery dataset of low anterior resection will be built as part of the effort, where there is significant clinical importance in recognizing anatomy to avoid hitting nerves that could lead to erectile dysfunction. We will compare the performance of the segmentation algorithms trained with our proposed sparse annotations against algorithms trained with denser (but more costly) annotations. If success- ful, the project will generate a solution to address the lack of high-quality data. The obtained preliminary data with algorithms trained on it will provide rich semantic information to potentially enhance surgery safety, improve treatment outcomes, and reduce healthcare expenses in the future.