publication . Article . Preprint . 2018

AudioPairBank: towards a large-scale tag-pair-based audio content analysis

Sager, Sebastian; Elizalde, Benjamin; Borth, Damian; Schulze, Christian; Raj, Bhiksha; Lane, Ian;
Open Access English
  • Published: 15 Sep 2018 Journal: EURASIP Journal on Audio (issn: 1687-4722, Copyright policy)
  • Publisher: SpringerOpen
  • Country: Switzerland
Recently, sound recognition has been used to identify sounds, such as car and river. However, sounds have nuances that may be better described by adjective-noun pairs such as slow car, and verb-noun pairs such as flying insects, which are under explored. Therefore, in this work we investigate the relation between audio content and both adjective-noun pairs and verb-noun pairs. Due to the lack of datasets with these kinds of annotations, we collected and processed the AudioPairBank corpus consisting of a combined total of 1,123 pairs and over 33,000 audio files. One contribution is the previously unavailable documentation of the challenges and implications of col...
ACM Computing Classification System: ComputingMethodologies_PATTERNRECOGNITION
free text keywords: Sound event database, Audio content analysis, Machine learning, Signal processing, Acoustics. Sound, QC221-246, Electronic computers. Computer science, QA75.5-76.95, computer science, Computer Science - Sound, Computer Science - Computation and Language, Acoustics and Ultrasonics, Electrical and Electronic Engineering, Sound recognition, Documentation, Speech recognition
Related Organizations
23 references, page 1 of 2

[1] R. F. Lyon, “Machine hearing: An emerging field [exploratory dsp],” Ieee signal processing magazine, vol. 27, no. 5, pp. 131-139, 2010.

[2] P. Over, J. Fiscus, G. Sanders, D. Joy, M. Michel, G. Awad, A. Smeaton, W. Kraaij, and G. Que´not, “Trecvid 2014-an overview of the goals, tasks, data, evaluation mechanisms and metrics,” in Proceedings of TRECVID, 2014, p. 52. [OpenAIRE]

[3] S. Pancoast, M. Akbacak, and M. Sanchez, “Supervised Acoustic Concept Extraction for Multimedia Event Detection,” in ACM International Workshop on Audio and Multimedia Methods for Large-Scale Video Analysis at ACM Multimedia, 2012. [OpenAIRE]

[4] B. Elizalde, M. Ravanelli, and G. Friedland, “Audio concept ranking for video event detection on user-generated content.” in Proceedings of SLAM@INTERSPEECH, 2013.

[5] S. Burger, Q. Jin, P. F. Schulam, and F. Metze, “ Noisemes: Manual Annotation of Environmental Noise in Audio Streams ,” Tech. Rep., 2012.

[6] J. Choi, B. Thomee, G. Friedland, L. Cao, K. Ni, D. Borth, B. Elizalde, L. Gottlieb, C. Carrano, R. Pearce et al., “The placing task: A largescale geo-estimation challenge for social-media videos and images,” in Proceedings of the 3rd ACM Multimedia Workshop on Geotagging and Its Applications in Multimedia. ACM, 2014, pp. 27-31.

[7] J. Salamon, C. Jacoby, and J. P. Bello, “A dataset and taxonomy for urban sound research,” in 22st ACM International Conference on Multimedia (ACM-MM'14), Orlando, FL, USA, Nov. 2014.

[8] K. J. Piczak, “ESC: dataset for environmental sound classification,” in Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26 - 30, 2015, 2015, pp. 1015-1018. [OpenAIRE]

[9] M. Janvier, X. Alameda-Pineda, L. Girin, and R. Horaud, “SoundEvent Recognition with a Companion Humanoid,” in Humanoids 2012 - IEEE International Conference on Humanoid Robotics. Osaka, Japan: IEEE, Nov. 2012, pp. 104-111. [Online]. Available: [OpenAIRE]

[10] D. Giannoulis, E. Benetos, D. Stowell, M. Rossignol, M. Lagrange, and M. D. Plumbley, “Detection and classification of acoustic scenes and events: an IEEE AASP challenge,” in 2013 IEEE WASPAA. [OpenAIRE]

[11] A. Mesaros, T. Heittola, and T. Virtanen, “TUT database for acoustic scene classification and sound event detection,” in 24th European Signal Processing Conference 2016, Budapest, Hungary, 2016. [OpenAIRE]

[12] M. Soleymani, Y.-H. Yang, Y.-G. Jiang, and S.-F. Chang, “Asm'15: The 1st international workshop on affect and sentiment in multimedia,” in Proceedings of the 23rd ACM international conference on Multimedia. ACM, 2015, pp. 1349-1349.

[13] S. Baccianella, A. Esuli, and F. Sebastiani, “Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining.”

[14] D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang, “Large-scale visual sentiment ontology and detectors using adjective noun pairs,” in Proceedings of the 21st ACM International Conference on Multimedia, ser. MM '13. New York, NY, USA: ACM, 2013, pp. 223-232.

[15] R. Schafer, The Soundscape: Our Sonic Environment and the Tuning of the World. Inner Traditions/Bear, 1993.

23 references, page 1 of 2
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue
publication . Article . Preprint . 2018

AudioPairBank: towards a large-scale tag-pair-based audio content analysis

Sager, Sebastian; Elizalde, Benjamin; Borth, Damian; Schulze, Christian; Raj, Bhiksha; Lane, Ian;