A Simple Fusion Of Deep And Shallow Learning For Acoustic Scene Classification

descriptionPublicationkeyboard_double_arrow_right Conference object , Article , Preprint , Other literature type 01 Jan 2018Embargo end date: 01 Jun 2018 Spain Publisher:ZenodoJournal:CoRR, volume abs/1806.07506Funded by:EC | AudioCommons, EC | COMPMUSIC

Authors: Fonseca, Eduardo; Gong, Rong; Serra, Xavier;

doi: 10.5281/zenodo.1422582 , 10.5281/zenodo.1422583 , 10.48550/arxiv.1806.07506

arXiv: 1806.07506

handle: 10230/36757

A Simple Fusion Of Deep And Shallow Learning For Acoustic Scene Classification

- Summary
- Subjects
- Metrics

Abstract

In the past, Acoustic Scene Classification systems havebeen based on hand crafting audio features that are input toa classifier. Nowadays, the common trend is to adopt datadriven techniques, e.g., deep learning, where audio repre-sentations are learned from data. In this paper, we proposea system that consists of a simple fusion of two methods ofthe aforementioned types: a deep learning approach wherelog-scaled mel-spectrograms are input to a convolutionalneural network, and a feature engineering approach, wherea collection of hand-crafted features is input to a gradientboosting machine. We first show that both methods pro-vide complementary information to some extent. Then, weuse a simple late fusion strategy to combine both meth-ods. We report classification accuracy of each method in-dividually and the combined system on the TUT AcousticScenes 2017 dataset. The proposed fused system outper-forms each of the individual methods and attains a classifi-cation accuracy of 72.8% on the evaluation set, improvingthe baseline system by 11.8%.

This work is partially supported by the European Union’sHorizon 2020 research and innovation programme undergrant agreement No 688382 “AudioCommons”, and theEuropean Research Council under the European Union’sSeventh Framework Program, as part of the CompMusicproject (ERC grant agreement 267583), and a Google Fac-ulty Research Award 2017. We are grateful for the GPUsdonated by NVidia.

Comunicació presentada a: 15th Sound and Music Computing Conference (SMC2018). Sonic crossing, celebrat a Limassol, Xipre, del 4 al 7 de juliol de 2018.

Country

Spain

Related Organizations

View all View all

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Statistics - Machine Learning, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Machine Learning (stat.ML), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average