publication . Preprint . Conference object . 2018

Convolutional Neural Network Achieves Human-level Accuracy in Music Genre Classification

Mingwen Dong;
Open Access English
  • Published: 26 Feb 2018
Abstract
Music genre classification is one example of content-based analysis of music signals. Traditionally, human-engineered features were used to automatize this task and 61% accuracy has been achieved in the 10-genre classification. However, it's still below the 70% accuracy that humans could achieve in the same task. Here, we propose a new method that combines knowledge of human perception study in music genre classification and the neurophysiology of the auditory system. The method works by training a simple convolutional neural network (CNN) to classify a short segment of the music signal. Then, the genre of a music is determined by splitting it into short segment...
Persistent Identifiers
Subjects
free text keywords: Computer Science - Sound, Computer Science - Learning, Electrical Engineering and Systems Science - Audio and Speech Processing, Speech recognition, Convolutional neural network, Computer science

[1] George Tzanetakis and Perry Cook. Musical genre classi cation of audio signals. IEEE Transactions on speech and audio processing, 10(5):293{302, 2002.

[2] Jan Schnupp, Israel Nelken, and Andrew King. Auditory neuroscience: Making sense of sound. MIT press, 2011.

[3] Frederic E Theunissen and Julie E Elie. Neural processing of natural sounds. Nature Reviews Neuroscience, 15(6):355{366, 2014.

[4] Honglak Lee, Peter Pham, Yan Largman, and Andrew Y Ng. Unsupervised feature learning for audio classi cation using convolutional deep belief networks. In Advances in neural information processing systems, pages 1096{1104, 2009.

[5] Douglas O'shaughnessy. Speech communication: human and machine. Universities press, 1987.

[6] Joseph W Picone. Signal modeling techniques in speech recognition. Proceedings of the IEEE, 81(9):1215{ 1247, 1993.

Abstract
Music genre classification is one example of content-based analysis of music signals. Traditionally, human-engineered features were used to automatize this task and 61% accuracy has been achieved in the 10-genre classification. However, it's still below the 70% accuracy that humans could achieve in the same task. Here, we propose a new method that combines knowledge of human perception study in music genre classification and the neurophysiology of the auditory system. The method works by training a simple convolutional neural network (CNN) to classify a short segment of the music signal. Then, the genre of a music is determined by splitting it into short segment...
Persistent Identifiers
Subjects
free text keywords: Computer Science - Sound, Computer Science - Learning, Electrical Engineering and Systems Science - Audio and Speech Processing, Speech recognition, Convolutional neural network, Computer science

[1] George Tzanetakis and Perry Cook. Musical genre classi cation of audio signals. IEEE Transactions on speech and audio processing, 10(5):293{302, 2002.

[2] Jan Schnupp, Israel Nelken, and Andrew King. Auditory neuroscience: Making sense of sound. MIT press, 2011.

[3] Frederic E Theunissen and Julie E Elie. Neural processing of natural sounds. Nature Reviews Neuroscience, 15(6):355{366, 2014.

[4] Honglak Lee, Peter Pham, Yan Largman, and Andrew Y Ng. Unsupervised feature learning for audio classi cation using convolutional deep belief networks. In Advances in neural information processing systems, pages 1096{1104, 2009.

[5] Douglas O'shaughnessy. Speech communication: human and machine. Universities press, 1987.

[6] Joseph W Picone. Signal modeling techniques in speech recognition. Proceedings of the IEEE, 81(9):1215{ 1247, 1993.

Any information missing or wrong?Report an Issue