Analyzing Vision Transformers for Image Classification in Class Embedding Space

descriptionPublicationkeyboard_double_arrow_right Conference object , Article , Preprint 01 Jan 2023Embargo end date: 01 Jan 2023 English Publisher:ZenodoJournal:CoRR, volume abs/2310.18969

Authors: Martina G. Vilas; Timothy Schaumlöffel; Gemma Roig;

doi: 10.5281/zenodo.10255027 , 10.5281/zenodo.10255028 , 10.48550/arxiv.2310.18969

arXiv: 2310.18969

Analyzing Vision Transformers for Image Classification in Class Embedding Space

- Summary
- Subjects
- Metrics

Abstract

Despite the growing use of transformer models in computer vision, a mechanistic understanding of these networks is still needed. This work introduces a method to reverse-engineer Vision Transformers trained to solve image classification tasks. Inspired by previous research in NLP, we demonstrate how the inner representations at any level of the hierarchy can be projected onto the learned class embedding space to uncover how these networks build categorical representations for their predictions. We use our framework to show how image tokens develop class-specific representations that depend on attention mechanisms and contextual information, and give insights on how self-attention and MLP layers differentially contribute to this categorical composition. We additionally demonstrate that this method (1) can be used to determine the parts of an image that would be important for detecting the class of interest, and (2) exhibits significant advantages over traditional linear probing approaches. Taken together, our results position our proposed framework as a powerful tool for mechanistic interpretability and explainability research.

NeurIPS 2023

Related Organizations

Goethe University Frankfurt
Germany

Keywords

FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Related to Research communities

Hessian Open Science Portal