Explicit Knowledge-based Reasoning for Visual Question Answering

Preprint English OPEN
Wang, Peng; Wu, Qi; Shen, Chunhua; Hengel, Anton van den; Dick, Anthony;
  • Subject: Computer Science - Computation and Language | Computer Science - Computer Vision and Pattern Recognition

We describe a method for visual question answering which is capable of reasoning about contents of an image on the basis of information extracted from a large-scale knowledge base. The method not only answers natural language questions using concepts not contained in th... View more
  • References (28)
    28 references, page 1 of 3

    Q10: Which image is the most related to transportation? A10: The right one. Left Related Concepts: Right Related Concepts: Attribute-furniture, 14 Attribute-vehicles, 145 Attribute-office, 14 Attribute-road, 142 Object-cat, 7 Object-highway, 112 Q11: Which image is the most related to chef? A11: The left one. Left Related Concepts: Right Related Concepts: Attribute-kitchen, 79 Attribute-wood, 7 Object-oven, 15 Attribute-computer, 3 Object-microwave, 8 Object-laptop, 3 Q12: Which image is the most related to programmer? A12: The right one. Left Related Concepts: Right Related Concepts: Object-dishwasher, 2 Attribute-computer, 53 Attribute-house, 1 Object-laptop, 16 Object-oven, 1 Object-mouse, 9 were available, the method we have described could mentation. In Proc. IEEE Conf. Computer Vision use it to draw sensible general conclusions about the Pattern Recognition, 2014.

    content of images. [12] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and We have also provided a dataset and methodology L. Fei-Fei. Imagenet: A large-scale hierarchical imfor testing the performance of general visual question age database. In Proc. IEEE Conf. Computer Vision answering techniques, and shown that Ahab substan- and Pattern Recognition, 2009.

    tially outperforms the currently predominant visual [13] O. Erling. Virtuoso, a Hybrid RDBMS/Graph Colquestion answering approach when so tested. umn Store. IEEE Data Eng. Bull., 35(1):3{8, 2012. [14] O. Etzioni, A. Fader, J. Christensen, S. Soderland, and M. Mausam. Open Information Extraction: The References Second Generation. In Proc. Int. Joint Conf. Arti - cial Intelligence, 2011.

    [1] S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, [15] A. Fader, S. Soderland, and O. Etzioni. Identifying C. L. Zitnick, and D. Parikh. VQA: Visual Question relations for open information extraction. In Proc. Answering - Version 2. arXiv:1505.00468v2, 2015. Empirical Methods in Natural Language Processing,

    [2] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cy- 2011. ganiak, and Z. Ives. DBpedia: A nucleus for a web [16] A. Fader, L. Zettlemoyer, and O. Etzioni. Open quesof open data. Springer, 2007. tion answering over curated and extracted knowl-

    [3] M. Banko, M. J. Cafarella, S. Soderland, M. Broad- edge bases. In Proc. ACM SIGKDD Conference on head, and O. Etzioni. Open information extraction Knowledge Discovery and Data Mining, 2014. for the web. In Proc. Int. Joint Conf. Arti cial In- [17] H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, and telligence, 2007. W. Xu. Are You Talking to a Machine? Dataset

    [4] J. Berant, A. Chou, R. Frostig, and P. Liang. Se- and Methods for Multilingual Image Question Anmantic Parsing on Freebase from Question-Answer swering. In Proc. Int. Conf. Adv. Neural Information Pairs. In Proc. Empirical Methods in Natural Lan- Processing Systems, 2015. guage Processing, pages 1533{1544, 2013. [18] D. Geman, S. Geman, N. Hallonquist, and

    [5] S. Bird, E. Klein, and E. Loper. Natural language L. Younes. Visual Turing test for computer vision processing with Python. O'Reilly Media, Inc., 2009. systems. Proceedings of the National Academy of

    [6] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, Sciences, 112(12):3618{3623, 2015. and J. Taylor. Freebase: a collaboratively created [19] R. Girshick. Fast R-CNN. arXiv:1504.08083, 2015. graph database for structuring human knowledge. [20] R. W. Group et al. Resource description framework, In Proc. ACM SIGMOD/PODS Conf., pages 1247{ 2014. http://www.w3.org/standards/techs/ 1250, 2008. rdf.

    [7] A. Bordes, S. Chopra, and J. Weston. Ques- [21] S. Hochreiter and J. Schmidhuber. Long short-term tion answering with subgraph embeddings. memory. Neural computation, 9(8):1735{1780, 1997. arXiv:1406.3676, 2014. [22] J. Ho art, F. M. Suchanek, K. Berberich, and

  • Related Research Results (1)
  • Metrics
Share - Bookmark