From open-vocabulary to vocabulary-free semantic segmentation

Name: From open-vocabulary to vocabulary-free semantic segmentation
Keywords: FOS: Computer and information sciences, Vocabulary-free; Semantic segmentation; Vision-language models, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition

Reichard K.; Rizzoli G.; Gasperini S.; Hoyer L.; Zanuttigh P.; Navab N.; Tombari F.

Found an issue? Give us feedback

Pattern Recognition ...arrow_drop_down

Pattern Recognition Letters

Article . 2025 . Peer-reviewed

License: CC BY

Data sources: Crossref

arXiv.org e-Print Archive

Preprint . 2025

Data sources: arXiv.org e-Print Archive

Padua research Archive (Archivio istituzionale della ricerca - Università di Padova)

Article . 2025

Data sources: Padua research Archive (Archivio istituzionale della ricerca - Università di Padova)

https://dx.doi.org/10.48550/ar...

Article . 2025

License: arXiv Non-Exclusive Distribution

Data sources: Datacite

DBLP

Article

Data sources: DBLP

From open-vocabulary to vocabulary-free semantic segmentation

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 01 Dec 2025Embargo end date: 01 Jan 2025 Italy English Publisher:Elsevier BVJournal:Pattern Recognition Letters, volume 198, pages 14-21 (issn: 0167-8655,

Copyright policy )

Authors: Reichard K.; Rizzoli G.; Gasperini S.; Hoyer L.; Zanuttigh P.; Navab N.; Tombari F.;

doi: 10.1016/j.patrec.2025.08.025 , 10.48550/arxiv.2502.11891

arXiv: 2502.11891

handle: 11577/3565720

From open-vocabulary to vocabulary-free semantic segmentation

- Summary
- Subjects
- Metrics

Abstract

Open-vocabulary semantic segmentation enables models to identify novel object categories beyond their training data. While this flexibility represents a significant advancement, current approaches still rely on manually specified class names as input, creating an inherent bottleneck in real-world applications. This work proposes a Vocabulary-Free Semantic Segmentation pipeline, eliminating the need for predefined class vocabularies. Specifically, we address the chicken-and-egg problem where users need knowledge of all potential objects within a scene to identify them, yet the purpose of segmentation is often to discover these objects. The proposed approach leverages Vision-Language Models to automatically recognize objects and generate appropriate class names, aiming to solve the challenge of class specification and naming quality. Through extensive experiments on several public datasets, we highlight the crucial role of the text encoder in model performance, particularly when the image text classes are paired with generated descriptions. Despite the challenges introduced by the sensitivity of the segmentation text encoder to false negatives within the class tagging process, which adds complexity to the task, we demonstrate that our fully automated pipeline significantly enhances vocabulary-free segmentation accuracy across diverse real-world scenarios.

Submitted to: Pattern Recognition Letters, Klara Reichard and Giulia Rizzoli equally contributed to this work

Country

Italy

Related Organizations

University of Padua
Italy

Keywords

FOS: Computer and information sciences, Vocabulary-free; Semantic segmentation; Vision-language models, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

hybrid