Explaining Deep Convolutional Neural Networks on Music Classification
Computer Science - Sound | Computer Science - Artificial Intelligence | Computer Science - Multimedia | Computer Science - Learning
Deep convolutional neural networks (CNNs) have been actively adopted in the field of music information retrieval, e.g. genre classification, mood detection, and chord recognition. However, the process of learning and prediction is little understood, particularly when it is applied to spectrograms. We introduce auralisation of a CNN to understand its underlying mechanism, which is based on a deconvolution procedure introduced in . Auralisation of a CNN is converting the learned convolutional features that are obtained from deconvolution into audio signals. In the experiments and discussions, we explain trained features of a 5-layer CNN based on the deconvolved spectrograms and auralised signals. The pairwise correlations per layers with varying different musical attributes are also investigated to understand the evolution of the learnt features. It is shown that in the deep layers, the features are learnt to capture textures, the patterns of continuous distributions, rather than shapes of lines.