Text-Guided Image Clustering

Name: Text-Guided Image Clustering
Keywords: Computer Science - Machine Learning, 102019 Machine Learning, Computer Science - Computer Vision and Pattern Recognition, 102019 Machine learning, 602011 Computerlinguistik, 602011 Computational linguistics

Stephan, Andreas; Miklautz, Lukas; Sidak, Kevin; Wahle, Jan Philip; Gipp, Bela; Plant, Claudia; Roth, Benjamin

Found an issue? Give us feedback

arXiv.org e-Print Ar...arrow_drop_down

arXiv.org e-Print Archive

Preprint . 2024

Data sources: arXiv.org e-Print Archive

https://doi.org/10.18653/v1/20...

Article . 2024 . Peer-reviewed

Data sources: Crossref

u:cris

Conference object . 2024

License: CC BY

Data sources: u:cris

Text-Guided Image Clustering

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jan 2024Publisher:Association for Computational Linguistics (ACL)Journal:Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

Authors: Stephan, Andreas; Miklautz, Lukas; Sidak, Kevin; Wahle, Jan Philip; Gipp, Bela; Plant, Claudia; Roth, Benjamin;

doi: 10.18653/v1/2024.eacl-long.180

arXiv: 2402.02996

Text-Guided Image Clustering

- Summary
- Subjects
- Metrics

Abstract

Image clustering divides a collection of images into meaningful groups, typically interpreted post-hoc via human-given annotations. Those are usually in the form of text, begging the question of using text as an abstraction for image clustering. Current image clustering methods, however, neglect the use of generated textual descriptions. We, therefore, propose Text-Guided Image Clustering, i.e., generating text using image captioning and visual question-answering (VQA) models and subsequently clustering the generated text. Further, we introduce a novel approach to inject task- or domain knowledge for clustering by prompting VQA models. Across eight diverse image clustering datasets, our results show that the obtained text representations often outperform image features. Additionally, we propose a counting-based cluster explainability method. Our evaluations show that the derived keyword-based explanations describe clusters better than the respective cluster accuracy suggests. Overall, this research challenges traditional approaches and paves the way for a paradigm shift in image clustering, using generated text.

Comment: Accepted to EACL 2024

Related Organizations

Keywords

Computer Science - Machine Learning, 102019 Machine Learning, Computer Science - Computer Vision and Pattern Recognition, 102019 Machine learning, 602011 Computerlinguistik, 602011 Computational linguistics

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	2
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

2

Top 10%

Average

Green