R. F. Lyon, “Machine hearing: An emerging field [exploratory dsp],” Ieee signal processing magazine, vol. 27, no. 5, pp. 131-139, 2010.
 P. Over, J. Fiscus, G. Sanders, D. Joy, M. Michel, G. Awad, A. Smeaton, W. Kraaij, and G. Que´not, “Trecvid 2014-an overview of the goals, tasks, data, evaluation mechanisms and metrics,” in Proceedings of TRECVID, 2014, p. 52. [OpenAIRE]
 S. Pancoast, M. Akbacak, and M. Sanchez, “Supervised Acoustic Concept Extraction for Multimedia Event Detection,” in ACM International Workshop on Audio and Multimedia Methods for Large-Scale Video Analysis at ACM Multimedia, 2012. [OpenAIRE]
 B. Elizalde, M. Ravanelli, and G. Friedland, “Audio concept ranking for video event detection on user-generated content.” in Proceedings of SLAM@INTERSPEECH, 2013.
 S. Burger, Q. Jin, P. F. Schulam, and F. Metze, “ Noisemes: Manual Annotation of Environmental Noise in Audio Streams ,” Tech. Rep., 2012.
 J. Choi, B. Thomee, G. Friedland, L. Cao, K. Ni, D. Borth, B. Elizalde, L. Gottlieb, C. Carrano, R. Pearce et al., “The placing task: A largescale geo-estimation challenge for social-media videos and images,” in Proceedings of the 3rd ACM Multimedia Workshop on Geotagging and Its Applications in Multimedia. ACM, 2014, pp. 27-31.
 J. Salamon, C. Jacoby, and J. P. Bello, “A dataset and taxonomy for urban sound research,” in 22st ACM International Conference on Multimedia (ACM-MM'14), Orlando, FL, USA, Nov. 2014.
 K. J. Piczak, “ESC: dataset for environmental sound classification,” in Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26 - 30, 2015, 2015, pp. 1015-1018. [OpenAIRE]
 M. Janvier, X. Alameda-Pineda, L. Girin, and R. Horaud, “SoundEvent Recognition with a Companion Humanoid,” in Humanoids 2012 - IEEE International Conference on Humanoid Robotics. Osaka, Japan: IEEE, Nov. 2012, pp. 104-111. [Online]. Available: https://hal.inria.fr/hal-00768767 [OpenAIRE]
 D. Giannoulis, E. Benetos, D. Stowell, M. Rossignol, M. Lagrange, and M. D. Plumbley, “Detection and classification of acoustic scenes and events: an IEEE AASP challenge,” in 2013 IEEE WASPAA. [OpenAIRE]
 A. Mesaros, T. Heittola, and T. Virtanen, “TUT database for acoustic scene classification and sound event detection,” in 24th European Signal Processing Conference 2016, Budapest, Hungary, 2016. [OpenAIRE]
 M. Soleymani, Y.-H. Yang, Y.-G. Jiang, and S.-F. Chang, “Asm'15: The 1st international workshop on affect and sentiment in multimedia,” in Proceedings of the 23rd ACM international conference on Multimedia. ACM, 2015, pp. 1349-1349.
 S. Baccianella, A. Esuli, and F. Sebastiani, “Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining.”
 D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang, “Large-scale visual sentiment ontology and detectors using adjective noun pairs,” in Proceedings of the 21st ACM International Conference on Multimedia, ser. MM '13. New York, NY, USA: ACM, 2013, pp. 223-232.
 R. Schafer, The Soundscape: Our Sonic Environment and the Tuning of the World. Inner Traditions/Bear, 1993.