Cross-modal Networks, Fine-Tuning, Data Augmentation and Dual Softmax Operation for MediaEval NewsImages 2023

Matching images to articles is challenging and can be considered a special version of the cross-mediaretrieval problem. This notebook paper presents our solution for the MediaEval NewsImages 2023benchmarking task. We investigate the performance of pre-trained cross-modal networks. Specifically, weinvestigate two pre-trained CLIP model variations and fine-tuned one for domain adaptation. Additionally,we utilize a data augmentation technique and a method for revising the similarities produced by eitherone of the networks, i.e., a dual softmax operation, to improve our solutions’ performance. We reportthe official results for our submitted runs and additional experiments we conducted to evaluate our runsinternally. We conclude that fine-tuning benefits the performance, and it is important to consider thedata’s nature when selecting the appropriate pre-trained CLIP model.

Related Organizations

Centre for Research and Technology Hellas
Greece

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Funded by

EC| CRiTERIA, EC| AI4TRUST