Incorporating Textual Similarity in Video Captioning Schemes

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Jun 2019Publisher:IEEEJournal:2019 IEEE International Conference on Engineering, Technology and Innovation (ICE/ITMC)Funded by:EC | ANITA

Authors: Konstantinos Gkountakos; Anastasios Dimou; Georgios Th. Papadopoulos; Petros Daras;

doi: 10.1109/ice.2019.8792602

Incorporating Textual Similarity in Video Captioning Schemes

- Summary
- Subjects
- Metrics

Abstract

The problem of video captioning has been heavily investigated from the research community the last years and, especially, since Recurrent Neural Networks (RNNs) have been introduced. Aforementioned approaches of video captioning, are usually based on sequence-to-sequence models that aim to exploit the visual information by detecting events, objects, or via matching entities to words. However, the exploitation of the contextual information that can be extracted from the vocabulary has not been investigated yet, except from approaches that make use of parts of speech such as verbs, nouns, and adjectives. The proposed approach is based on the assumption that textually similar captions should represent similar visual content. Specifically, we propose a novel loss function that penalizes/rewards the wrong/correct predicted words based on the semantic cluster that they belong to. The proposed method is evaluated using two widely-known datasets in the video captioning domain, Microsoft Research - Video to Text (MSR-VTT) and Microsoft Research Video Description Corpus (MSVD). Finally, experimental analysis proves that the proposed method outperforms the baseline approach in most cases.

Related Organizations

Information Technology Institute
Egypt
Centre for Research and Technology Hellas
Greece

Keywords

Video captioning, Recurrent Neural Network, Textual information, Word2Vec, Encoder-decoder

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	8
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%