Automatic Caption Generation for News Images

descriptionPublicationkeyboard_double_arrow_right Article 01 Apr 2013Publisher:Institute of Electrical and Electronics Engineers (IEEE)Journal:IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 35, pages 797-812 (issn: 0162-8828, eissn: 2160-9292,

Copyright policy )

Authors: Yansong Feng 0002; Mirella Lapata;

doi: 10.1109/tpami.2012.118

pmid: 22641700

Automatic Caption Generation for News Images

- Summary
- Subjects
- Metrics

Abstract

This paper is concerned with the task of automatically generating captions for images, which is important for many image-related applications. Examples include video and image retrieval as well as the development of tools that aid visually impaired individuals to access pictorial information. Our approach leverages the vast resource of pictures available on the web and the fact that many of them are captioned and colocated with thematically related documents. Our model learns to create captions from a database of news articles, the pictures embedded in them, and their captions, and consists of two stages. Content selection identifies what the image and accompanying article are about, whereas surface realization determines how to verbalize the chosen content. We approximate content selection with a probabilistic image annotation model that suggests keywords for an image. The model postulates that images and their textual descriptions are generated by a shared set of latent variables (topics) and is trained on a weakly labeled dataset (which treats the captions and associated news articles as image labels). Inspired by recent work in summarization, we propose extractive and abstractive surface realization models. Experimental results show that it is viable to generate captions that are pertinent to the specific content of an image and its associated article, while permitting creativity in the description. Indeed, the output of our abstractive model compares favorably to handwritten captions and is often superior to extractive methods.

Related Organizations

Wuhan Engineering Science & Technology Institute
China (People's Republic of)
Peking University
China (People's Republic of)
University of Edinburgh
United Kingdom
Peking University
China (People's Republic of)

Keywords

Databases, Factual, Image Processing, Computer-Assisted, Information Storage and Retrieval, Newspapers as Topic, Reproducibility of Results, Models, Theoretical, Natural Language Processing

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	64
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%