Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Name: Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection
Keywords: FOS: Computer and information sciences, Computer Science - Robotics, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Robotics (cs.RO), Machine Learning (cs.LG)

Zhou, Enshen; Su, Qi; Chi, Cheng; Zhang, Zhizheng; Wang, Zhongyuan; Huang, Tiejun; Sheng, Lu; Wang, He

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2024

Data sources: arXiv.org e-Print Archive

https://doi.org/10.1109/cvpr52...

Article . 2025 . Peer-reviewed

License: STM Policy #29

Data sources: Crossref

https://dx.doi.org/10.48550/ar...

Article . 2024

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 10 Jun 2025Embargo end date: 01 Jan 2024Publisher:IEEEJournal:2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Authors: Zhou, Enshen; Su, Qi; Chi, Cheng; Zhang, Zhizheng; Wang, Zhongyuan; Huang, Tiejun; Sheng, Lu; +1 Authors

doi: 10.1109/cvpr52734.2025.00649 , 10.48550/arxiv.2412.04455

arXiv: 2412.04455

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

- Summary
- Subjects
- Related research
  (3)
- Metrics

Abstract

Automatic detection and prevention of open-set failures are crucial in closed-loop robotic systems. Recent studies often struggle to simultaneously identify unexpected failures reactively after they occur and prevent foreseeable ones proactively. To this end, we propose Code-as-Monitor (CaM), a novel paradigm leveraging the vision-language model (VLM) for both open-set reactive and proactive failure detection. The core of our method is to formulate both tasks as a unified set of spatio-temporal constraint satisfaction problems and use VLM-generated code to evaluate them for real-time monitoring. To enhance the accuracy and efficiency of monitoring, we further introduce constraint elements that abstract constraint-related entities or their parts into compact geometric elements. This approach offers greater generality, simplifies tracking, and facilitates constraint-aware visual programming by leveraging these elements as visual prompts. Experiments show that CaM achieves a 28.7% higher success rate and reduces execution time by 31.8% under severe disturbances compared to baselines across three simulators and a real-world setting. Moreover, CaM can be integrated with open-loop control policies to form closed-loop systems, enabling long-horizon tasks in cluttered scenes with dynamic environments.

Accepted by CVPR 2025. Project page: https://zhoues.github.io/Code-as-Monitor/

Related Organizations

Peking University
China (People's Republic of)
Peking University
China (People's Republic of)
Peking University
China (People's Republic of)
Peking University
China (People's Republic of)
Peking University
China (People's Republic of)

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Robotics, Computer Science - Machine Learning, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Robotics (cs.RO), Machine Learning (cs.LG)

3 Research products, page 1 of 1

CLIPOT software on GitHub
IsRelatedTo
SMT-bench software on GitHub
IsRelatedTo
OmniGibson software on GitHub
IsRelatedTo

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

1

Average

Green

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

3 Research products, page 1 of 1

CLIPOT software on GitHub

SMT-bench software on GitHub

OmniGibson software on GitHub