Reconstruction Network for Video Captioning

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 01 Jun 2018Embargo end date: 01 Jan 2018Publisher:IEEEJournal:2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Authors: Bairui Wang; Lin Ma 0002; Wei Zhang 0021; Wei Liu 0005;

doi: 10.1109/cvpr.2018.00795 , 10.48550/arxiv.1803.11438

arXiv: 1803.11438

Reconstruction Network for Video Captioning

- Summary
- Subjects
- Metrics

Abstract

In this paper, the problem of describing visual contents of a video sequence with natural language is addressed. Unlike previous video captioning work mainly exploiting the cues of video contents to make a language description, we propose a reconstruction network (RecNet) with a novel encoder-decoder-reconstructor architecture, which leverages both the forward (video to sentence) and backward (sentence to video) flows for video captioning. Specifically, the encoder-decoder makes use of the forward flow to produce the sentence description based on the encoded video semantic features. Two types of reconstructors are customized to employ the backward flow and reproduce the video features based on the hidden state sequence generated by the decoder. The generation loss yielded by the encoder-decoder and the reconstruction loss introduced by the reconstructor are jointly drawn into training the proposed RecNet in an end-to-end fashion. Experimental results on benchmark datasets demonstrate that the proposed reconstructor can boost the encoder-decoder models and leads to significant gains in video caption accuracy.

Accepted by CVPR 2018

Related Organizations

Shandong Women’s University
China (People's Republic of)
Tencent (China)
China (People's Republic of)

Keywords

FOS: Computer and information sciences, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	212
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%