Cross Modal Retrieval with Querybank Normalisation

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jun 2022Embargo end date: 01 Jan 2021Publisher:IEEEJournal:2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Authors: Simion-Vlad Bogolin; Ioana Croitoru; Hailin Jin; Yang Liu 0105; Samuel Albanie;

doi: 10.1109/cvpr52688.2022.00513 , 10.48550/arxiv.2112.12777

arXiv: 2112.12777

Cross Modal Retrieval with Querybank Normalisation

- Summary
- Subjects
- Metrics

Abstract

Profiting from large-scale training datasets, advances in neural architecture design and efficient inference, joint embeddings have become the dominant approach for tackling cross-modal retrieval. In this work we first show that, despite their effectiveness, state-of-the-art joint embeddings suffer significantly from the longstanding "hubness problem" in which a small number of gallery embeddings form the nearest neighbours of many queries. Drawing inspiration from the NLP literature, we formulate a simple but effective framework called Querybank Normalisation (QB-Norm) that re-normalises query similarities to account for hubs in the embedding space. QB-Norm improves retrieval performance without requiring retraining. Differently from prior work, we show that QB-Norm works effectively without concurrent access to any test set queries. Within the QB-Norm framework, we also propose a novel similarity normalisation method, the Dynamic Inverted Softmax, that is significantly more robust than existing approaches. We showcase QB-Norm across a range of cross modal retrieval models and benchmarks where it consistently enhances strong baselines beyond the state of the art. Code is available at https://vladbogo.github.io/QB-Norm/.

Accepted at CVPR 2022

Related Organizations

University of Oxford
United Kingdom
University of Cambridge
United Kingdom
Peking University
China (People's Republic of)

Keywords

FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	28
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

28

Top 10%

Green

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Fields of Science

engineering and technology

electrical engineering, electronic engineering, information engineering

Related to Research communities

UArctic