Answer-Type Prediction for Visual Question Answering

descriptionPublicationkeyboard_double_arrow_right Article , Conference object 01 Jun 2016 United States Publisher:IEEEJournal:2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Authors: Kafle, Kushal; Kanan, Christopher;

doi: 10.1109/cvpr.2016.538

Answer-Type Prediction for Visual Question Answering

- Summary
- Subjects
- Metrics

Abstract

Recently, algorithms for object recognition and related tasks have become sufficiently proficient that new vision tasks can now be pursued. In this paper, we build a system capable of answering open-ended text-based questions about images, which is known as Visual Question Answering (VQA). Our approach's key insight is that we can predict the form of the answer from the question. We formulate our solution in a Bayesian framework. When our approach is combined with a discriminative model, the combined model achieves state-of-the-art results on four benchmark datasets for open-ended VQA: DAQUAR, COCO-QA, The VQA Dataset, and Visual7W.

Country

United States

Related Organizations

Rochester Institute of Technology
United States

Keywords

machine learning, natural language processing, computer vision

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	70
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 1%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%