publication . Conference object . Preprint . 2018

Neural Baby Talk

Lu, Jiasen; Yang, Jianwei; Batra, Dhruv; Parikh, Devi;
Open Access
  • Published: 26 Mar 2018
  • Publisher: IEEE
Abstract
Comment: 12 pages, 7 figures, CVPR 2018
Subjects
ACM Computing Classification System: ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
free text keywords: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
51 references, page 1 of 4

[1] P. Anderson, B. Fernando, M. Johnson, and S. Gould. Spice: Semantic propositional image caption evaluation. In ECCV, 2016. 6

[2] P. Anderson, B. Fernando, M. Johnson, and S. Gould. Guided open vocabulary image captioning with constrained beam search. EMNLP, 2017. 3, 7, 8

[3] P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. Bottom-up and top-down attention for image captioning and visual question answering. In CVPR, 2018. 3, 5, 6, 7, 8

[4] L. Anne Hendricks, S. Venugopalan, M. Rohrbach, R. Mooney, K. Saenko, and T. Darrell. Deep compositional captioning: Describing novel object categories without paired training data. In CVPR, 2016. 3, 8

[5] S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. VQA: Visual Question Answering. In ICCV, 2015. 1

[6] D. Chen and C. Manning. A fast and accurate dependency parser using neural networks. In EMNLP, 2014. 5

[7] X. Chen and C. Lawrence Zitnick. Mind's eye: A recurrent visual representation for image caption generation. In CVPR, 2015. 3

[8] A. Das, H. Agrawal, C. L. Zitnick, D. Parikh, and D. Batra. Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? In EMNLP, 2016. 1

[9] A. Das, S. Kottur, K. Gupta, A. Singh, D. Yadav, J. M. Moura, D. Parikh, and D. Batra. Visual Dialog. In CVPR, 2017. 1

[10] M. Denkowski and A. Lavie. Meteor universal: Language specific translation evaluation for any target language. In EACL 2014 Workshop on Statistical Machine Translation, 2014. 6 [OpenAIRE]

[11] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015. 3

[12] H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dolla´r, J. Gao, X. He, M. Mitchell, J. C. Platt, et al. From captions to visual concepts and back. In CVPR, 2015. 1, 3

[13] A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010. 2, 3

[14] K. He, G. Gkioxari, P. Dolla´r, and R. Girshick. Mask r-cnn. ICCV, 2017. 5

[15] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016. 5

51 references, page 1 of 4
Abstract
Comment: 12 pages, 7 figures, CVPR 2018
Subjects
ACM Computing Classification System: ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION
free text keywords: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language
51 references, page 1 of 4

[1] P. Anderson, B. Fernando, M. Johnson, and S. Gould. Spice: Semantic propositional image caption evaluation. In ECCV, 2016. 6

[2] P. Anderson, B. Fernando, M. Johnson, and S. Gould. Guided open vocabulary image captioning with constrained beam search. EMNLP, 2017. 3, 7, 8

[3] P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang. Bottom-up and top-down attention for image captioning and visual question answering. In CVPR, 2018. 3, 5, 6, 7, 8

[4] L. Anne Hendricks, S. Venugopalan, M. Rohrbach, R. Mooney, K. Saenko, and T. Darrell. Deep compositional captioning: Describing novel object categories without paired training data. In CVPR, 2016. 3, 8

[5] S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. VQA: Visual Question Answering. In ICCV, 2015. 1

[6] D. Chen and C. Manning. A fast and accurate dependency parser using neural networks. In EMNLP, 2014. 5

[7] X. Chen and C. Lawrence Zitnick. Mind's eye: A recurrent visual representation for image caption generation. In CVPR, 2015. 3

[8] A. Das, H. Agrawal, C. L. Zitnick, D. Parikh, and D. Batra. Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? In EMNLP, 2016. 1

[9] A. Das, S. Kottur, K. Gupta, A. Singh, D. Yadav, J. M. Moura, D. Parikh, and D. Batra. Visual Dialog. In CVPR, 2017. 1

[10] M. Denkowski and A. Lavie. Meteor universal: Language specific translation evaluation for any target language. In EACL 2014 Workshop on Statistical Machine Translation, 2014. 6 [OpenAIRE]

[11] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015. 3

[12] H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dolla´r, J. Gao, X. He, M. Mitchell, J. C. Platt, et al. From captions to visual concepts and back. In CVPR, 2015. 1, 3

[13] A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010. 2, 3

[14] K. He, G. Gkioxari, P. Dolla´r, and R. Girshick. Mask r-cnn. ICCV, 2017. 5

[15] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016. 5

51 references, page 1 of 4
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue