publication . Conference object . Article . Preprint . 2016

Adaptive DCTNet for audio signal classification

Yin Xian; Yunchen Pu; Zhe Gan; Liang Lu; Andrew Thompson;
Open Access
  • Published: 12 Dec 2016
  • Publisher: IEEE
Abstract
In this paper, we investigate DCTNet for audio signal classification. Its output feature is related to Cohen's class of time-frequency distributions. We introduce the use of adaptive DCTNet (A-DCTNet) for audio signals feature extraction. The A-DCTNet applies the idea of constant-Q transform, with its center frequencies of filterbanks geometrically spaced. The A-DCTNet is adaptive to different acoustic scales, and it can better capture low frequency acoustic information that is sensitive to human audio perception than features such as Mel-frequency spectral coefficients (MFSC). We use features extracted by the A-DCTNet as input for classifiers. Experimental resu...
Subjects
free text keywords: Computer Science - Sound, Speech recognition, Feature extraction, Time–frequency analysis, Classification rate, Perception, media_common.quotation_subject, media_common, Audio signal, Audio signal classification, Recurrent neural network, Computer science, Signal processing
28 references, page 1 of 2

[1] D. Gerhard, Audio signal classification: History and current techniques, Citeseer, 2003.

[2] S. Stevens, J. Volkmann, and E. Newman, “A scale for the measurement of the psychological magnitude pitch,” The Journal of the Acoustical Society of America, vol. 8, no. 3, pp. 185-190, 1937. [OpenAIRE]

[3] S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE transactions on acoustics, speech, and signal processing, vol. 28, no. 4, pp. 357-366, 1980. [OpenAIRE]

[4] J. Openshaw and J. Masan, “On the limitations of cepstral features in noise,” in Acoustics, Speech, and Signal Processing, 1994. ICASSP-94., 1994 IEEE International Conference on. IEEE, 1994, vol. 2, pp. II-49.

[5] Y. LeCun and Y. Bengio, “Convolutional networks for images, speech, and time series,” The handbook of brain theory and neural networks, vol. 3361, no. 10, 1995.

[6] G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, and T. Sainath, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” Signal Processing Magazine, IEEE, vol. 29, no. 6, pp. 82-97, 2012.

[7] H. Lee, P. Pham, Y. Largman, and A. Ng, “Unsupervised feature learning for audio classification using convolutional deep belief networks,” in Advances in neural information processing systems, 2009, pp. 1096-1104.

[8] V. Jain and S. Seung, “Natural image denoising with convolutional networks,” in Advances in Neural Information Processing Systems, 2009, pp. 769-776.

[9] T. Sainath, B. Kingsbury, A. Mohamed, and B. Ramabhadran, “Learning filter banks within a deep neural network framework,” in Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 2013, pp. 297-302.

[10] S. Mallat, “Group invariant scattering,” Communications on Pure and Applied Mathematics, vol. 65, no. 10, pp. 1331-1398, 2012.

[11] J. Ande´n and S. Mallat, “Deep scattering spectrum,” Signal Processing, IEEE Transactions on, vol. 62, no. 16, pp. 4114-4128, 2014.

[12] T. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, and Y. Ma, “PCANet: A simple deep learning baseline for image classification?,” arXiv preprint arXiv:1404.3606, 2014.

[13] Y. Xian, Detection and classification of whale vocalizations, Ph.D. thesis, Duke University, 2015.

[14] Y. Xian, A. Thompson, X. Sun, D. Nowacek, and L. Nolte, “DCTNet and PCANet for acoustic signal feature extraction,” arXiv preprint arXiv:1404.3606, 2016.

[15] C. Ng and A. Teoh, “DCTNet: A simple learning-free approach for face recognition,” arXiv preprint arXiv: 1507.02049, 2015.

28 references, page 1 of 2
Any information missing or wrong?Report an Issue