AudioPairBank: towards a Large-Scale Tag-Pair-based Audio Content Analysis

Article, Preprint English OPEN
Sebastian Säger; Benjamin Elizalde; Damian Borth; Christian Schulze; Bhiksha Raj; Ian Lane;
  • Publisher: Springer
  • Journal: EURASIP Journal on Audio (issn: 1687-4722)
  • Publisher copyright policies & self-archiving
  • Related identifiers: doi: 10.1186/s13636-018-0137-5
  • Subject: Computer Science - Computation and Language | Computer Science - Sound | Acoustics. Sound | computer science | Signal processing | Sound event database | Electronic computers. Computer science | Machine learning | QC221-246 | QA75.5-76.95 | Audio content analysis
    acm: ComputingMethodologies_PATTERNRECOGNITION

Recently, sound recognition has been used to identify sounds, such as car and river. However, sounds have nuances that may be better described by adjective-noun pairs such as slow car, and verb-noun pairs such as flying insects, which are under explored. Therefore, in t... View more
  • References (23)
    23 references, page 1 of 3

    [1] R. F. Lyon, “Machine hearing: An emerging field [exploratory dsp],” Ieee signal processing magazine, vol. 27, no. 5, pp. 131-139, 2010.

    [2] P. Over, J. Fiscus, G. Sanders, D. Joy, M. Michel, G. Awad, A. Smeaton, W. Kraaij, and G. Que´not, “Trecvid 2014-an overview of the goals, tasks, data, evaluation mechanisms and metrics,” in Proceedings of TRECVID, 2014, p. 52.

    [3] S. Pancoast, M. Akbacak, and M. Sanchez, “Supervised Acoustic Concept Extraction for Multimedia Event Detection,” in ACM International Workshop on Audio and Multimedia Methods for Large-Scale Video Analysis at ACM Multimedia, 2012.

    [4] B. Elizalde, M. Ravanelli, and G. Friedland, “Audio concept ranking for video event detection on user-generated content.” in Proceedings of SLAM@INTERSPEECH, 2013.

    [5] S. Burger, Q. Jin, P. F. Schulam, and F. Metze, “ Noisemes: Manual Annotation of Environmental Noise in Audio Streams ,” Tech. Rep., 2012.

    [6] J. Choi, B. Thomee, G. Friedland, L. Cao, K. Ni, D. Borth, B. Elizalde, L. Gottlieb, C. Carrano, R. Pearce et al., “The placing task: A largescale geo-estimation challenge for social-media videos and images,” in Proceedings of the 3rd ACM Multimedia Workshop on Geotagging and Its Applications in Multimedia. ACM, 2014, pp. 27-31.

    [7] J. Salamon, C. Jacoby, and J. P. Bello, “A dataset and taxonomy for urban sound research,” in 22st ACM International Conference on Multimedia (ACM-MM'14), Orlando, FL, USA, Nov. 2014.

    [8] K. J. Piczak, “ESC: dataset for environmental sound classification,” in Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26 - 30, 2015, 2015, pp. 1015-1018.

    [9] M. Janvier, X. Alameda-Pineda, L. Girin, and R. Horaud, “SoundEvent Recognition with a Companion Humanoid,” in Humanoids 2012 - IEEE International Conference on Humanoid Robotics. Osaka, Japan: IEEE, Nov. 2012, pp. 104-111. [Online]. Available:

    [10] D. Giannoulis, E. Benetos, D. Stowell, M. Rossignol, M. Lagrange, and M. D. Plumbley, “Detection and classification of acoustic scenes and events: an IEEE AASP challenge,” in 2013 IEEE WASPAA.

  • Similar Research Results (4)
  • Metrics
Share - Bookmark