publication . Preprint . Article . Other literature type . 2017

MirBot: A collaborative object recognition system for smartphones using convolutional neural networks

Antonio Pertusa; Antonio-Javier Gallego;
Open Access English
  • Published: 09 Jun 2017
  • Country: Spain
Abstract
MirBot is a collaborative application for smartphones that allows users to perform object recognition. This app can be used to take a photograph of an object, select the region of interest and obtain the most likely class (dog, chair, etc.) by means of similarity search using features extracted from a convolutional neural network (CNN). The answers provided by the system can be validated by the user so as to improve the results for future queries. All the images are stored together with a series of metadata, thus enabling a multimodal incremental dataset labeled with synset identifiers from the WordNet ontology. This dataset grows continuously thanks to the user...
Subjects
free text keywords: Object recognition, Image datasets, Convolutional neural networks, Transfer learning, Multimodality, Human computer interaction, Lenguajes y Sistemas Informáticos, Computer Science - Computer Vision and Pattern Recognition, Cognitive Neuroscience, Artificial Intelligence, Computer Science Applications
Related Organizations

[12] M. Lew, N. Sebe, C. Djeraba, R. Jain, Content- [20] B. Thomee, E. M. Bakker, M. S. Lew, TOPbased multimedia information retrieval: State of SURF: A Visual Words Toolkit, ACM Internathe art and challenges, ACM Trans. on Multime- tional Conference on Multimediadoi:10.1145/ dia Computing, Communications, and Applica- 1873951.1874250. t1i1o2n6s0205(1.) (2006) 1{19. doi:10.1145/1126004. [21] Jse.rPmhainlb,inO,bOj.ecCthurmet,riMev.alIsawridt,hJ.laSrigveic,vAoc.aZbius--

[13] K. Mikolajczyk, C. Schmid, Scale & a ne invari- laries and fast spatial matching, in: IEEE ant interest point detectors, International Jour- Conf. on Computer Vision and Pattern Recogninal of Computer Vision 60 (1) (2004) 63{86. tion (CVPR), 2007. doi:10.1109/CVPR.2007. doi:10.1023/B:VISI.0000027790.02288.f2. 383172.

[22] G. Salton, M. J. McGill, Introduction to mod- [31] F. Chollet, Xception: ern information retrieval, McGraw-Hill, Inc, depthwise separable New York, NY, USA, 1986. doi:10.1108/ abs/1610.02357. 01435121111132365. URL http://arxiv.org/abs/1610.02357

[25] O. Russakovsky, J. Deng, H. Su, J. Krause, [34] F. Chollet, Keras, https://github.com/ S. Satheesh, S. Ma, Z. Huang, A. Karpathy, fchollet/keras (2015). A. Khosla, M. Bernstein, A. C. Berg, L. Fei-Fei, ImageNet Large Scale Visual Recognition Chal- [35] H. He, E. Garcia, Learning from imbalanced lenge, International Journal of Computer Vision data, IEEE Trans. Knowl. Data Eng. 21 (2009) (IJCV)doi:10.1007/s11263-015-0816-y. 1263{1284. doi:10.1109/TKDE.2008.239. [OpenAIRE]

[27] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014. arXiv:1409.1556.

Abstract
MirBot is a collaborative application for smartphones that allows users to perform object recognition. This app can be used to take a photograph of an object, select the region of interest and obtain the most likely class (dog, chair, etc.) by means of similarity search using features extracted from a convolutional neural network (CNN). The answers provided by the system can be validated by the user so as to improve the results for future queries. All the images are stored together with a series of metadata, thus enabling a multimodal incremental dataset labeled with synset identifiers from the WordNet ontology. This dataset grows continuously thanks to the user...
Subjects
free text keywords: Object recognition, Image datasets, Convolutional neural networks, Transfer learning, Multimodality, Human computer interaction, Lenguajes y Sistemas Informáticos, Computer Science - Computer Vision and Pattern Recognition, Cognitive Neuroscience, Artificial Intelligence, Computer Science Applications
Related Organizations

[12] M. Lew, N. Sebe, C. Djeraba, R. Jain, Content- [20] B. Thomee, E. M. Bakker, M. S. Lew, TOPbased multimedia information retrieval: State of SURF: A Visual Words Toolkit, ACM Internathe art and challenges, ACM Trans. on Multime- tional Conference on Multimediadoi:10.1145/ dia Computing, Communications, and Applica- 1873951.1874250. t1i1o2n6s0205(1.) (2006) 1{19. doi:10.1145/1126004. [21] Jse.rPmhainlb,inO,bOj.ecCthurmet,riMev.alIsawridt,hJ.laSrigveic,vAoc.aZbius--

[13] K. Mikolajczyk, C. Schmid, Scale & a ne invari- laries and fast spatial matching, in: IEEE ant interest point detectors, International Jour- Conf. on Computer Vision and Pattern Recogninal of Computer Vision 60 (1) (2004) 63{86. tion (CVPR), 2007. doi:10.1109/CVPR.2007. doi:10.1023/B:VISI.0000027790.02288.f2. 383172.

[22] G. Salton, M. J. McGill, Introduction to mod- [31] F. Chollet, Xception: ern information retrieval, McGraw-Hill, Inc, depthwise separable New York, NY, USA, 1986. doi:10.1108/ abs/1610.02357. 01435121111132365. URL http://arxiv.org/abs/1610.02357

[25] O. Russakovsky, J. Deng, H. Su, J. Krause, [34] F. Chollet, Keras, https://github.com/ S. Satheesh, S. Ma, Z. Huang, A. Karpathy, fchollet/keras (2015). A. Khosla, M. Bernstein, A. C. Berg, L. Fei-Fei, ImageNet Large Scale Visual Recognition Chal- [35] H. He, E. Garcia, Learning from imbalanced lenge, International Journal of Computer Vision data, IEEE Trans. Knowl. Data Eng. 21 (2009) (IJCV)doi:10.1007/s11263-015-0816-y. 1263{1284. doi:10.1109/TKDE.2008.239. [OpenAIRE]

[27] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, 2014. arXiv:1409.1556.

Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue