ORena-FOCUS: Foreign Object Contextual Understanding for Safe Surgical AI Challenge

Maier-Hein, Lena; Luttner, Lucas; Godau, Patrick; Weiser, Thomas; Kolbinger, Fiona; Hashimoto, Daniel; Pausch, Thomas; Speidel, Stefanie; Stoyanov, Danail

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Other ORP type

Data sources: ZENODO

ORena-FOCUS: Foreign Object Contextual Understanding for Safe Surgical AI Challenge

appsOther research productkeyboard_double_arrow_right Other ORP type Under curationPublisher:Zenodo

Authors: Maier-Hein, Lena; Luttner, Lucas; Godau, Patrick; Weiser, Thomas; Kolbinger, Fiona; Hashimoto, Daniel; Pausch, Thomas; +2 Authors

doi: 10.5281/zenodo.19848528

ORena-FOCUS: Foreign Object Contextual Understanding for Safe Surgical AI Challenge

- Summary

Abstract

Recent progress in general-domain Vision Language Models (VLMs) has enabled increasingly strong temporal reasoning over extended video streams. However, the surgical AI community has so far lacked a dedicated, standardized challenge that evaluates whether these emerging capabilities translate to real clinical workflows, where critical events unfold over tens of minutes to hours. This gap is important: many clinically meaningful questions in minimally invasive surgery require persistent memory, temporal consistency, and reasoning across long time horizons. The ORena SAVE FOCUS challenge addresses this unmet need by providing a structured benchmark that targets an urgent patient-safety problem in minimally invasive procedures: ensuring the retrieval of foreign objects, such as sponges and needles, from the abdomen at the end of the operation. Unintentionally leaving foreign objects in the abdomen is a rare but consequential incident in minimally invasive surgery, as they can cause serious complications. Specifically, FOCUS aims to generate scientific progress by tackling two fundamental research questions (RQs): RQ1 (Clinical utility): Can VLMs generate clinically meaningful and safety-relevant information about surgical foreign objects? This question targets the clinical value of VLMs for intraoperative quality assurance, focusing on actionable insights related to foreign objects such as sponges, needles, and clips. RQ2 (Technical limits): What are the current limitations of VLMs in surgical scene reasoning? To answer this question, FOCUS encompasses three tracks that progressively increase the temporal and contextual demands on the model: a FRAME Track (single-image understanding) to assess foundational visual perception and surgical-domain interpretation; a SEGMENT Track (short video segments) to evaluate short-term temporal reasoning, local tracking, and action understanding; and a PROCEDURE Track (long-context up to full procedures reflecting real intraoperative and postoperative query scenarios) to probe long-horizon memory, persistent object tracking across occlusions and scene changes, aggregation over time (e.g., counting and retrieval status), and global reasoning across events. Together, these tracks enable a systematic characterization of where current VLMs succeed and fail as task complexity transitions from instantaneous perception to long-context intraoperative reasoning. Critically, FOCUS is enabled by a unique dataset that, to the best of our knowledge, is the first challenge resource to provide full-length laparoscopic videos with fine-grained foreign-object annotations at scale. Importantly, the multi-center dataset includes instance-consistent labels, making it possible to evaluate models not only on short-term detection, but also on long-horizon tracking, counting, and retrieval verification across extended procedures. With a total of over 100,000 Visual Question Answering (VQA) pairs obtained from 400 surgical videos acquired from all over the world, FOCUS establishes a standardized benchmark that simultaneously (i) probes the technical limits of long-context VLMs in real-world surgical video and (ii) addresses a concrete quality-assurance objective with direct relevance to intraoperative patient safety. FOCUS is hosted within ORena, a new umbrella framework for surgical AI competitions inspired by the “Arena” paradigm, but adapted to the specific constraints and opportunities of the Operating Room (OR).

Found an issue? Give us feedback