D. Gerhard, Audio signal classification: History and current techniques, Citeseer, 2003.
 S. Stevens, J. Volkmann, and E. Newman, “A scale for the measurement of the psychological magnitude pitch,” The Journal of the Acoustical Society of America, vol. 8, no. 3, pp. 185-190, 1937. [OpenAIRE]
 S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE transactions on acoustics, speech, and signal processing, vol. 28, no. 4, pp. 357-366, 1980. [OpenAIRE]
 J. Openshaw and J. Masan, “On the limitations of cepstral features in noise,” in Acoustics, Speech, and Signal Processing, 1994. ICASSP-94., 1994 IEEE International Conference on. IEEE, 1994, vol. 2, pp. II-49.
 Y. LeCun and Y. Bengio, “Convolutional networks for images, speech, and time series,” The handbook of brain theory and neural networks, vol. 3361, no. 10, 1995.
 G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, and T. Sainath, “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” Signal Processing Magazine, IEEE, vol. 29, no. 6, pp. 82-97, 2012.
 H. Lee, P. Pham, Y. Largman, and A. Ng, “Unsupervised feature learning for audio classification using convolutional deep belief networks,” in Advances in neural information processing systems, 2009, pp. 1096-1104.
 V. Jain and S. Seung, “Natural image denoising with convolutional networks,” in Advances in Neural Information Processing Systems, 2009, pp. 769-776.
 T. Sainath, B. Kingsbury, A. Mohamed, and B. Ramabhadran, “Learning filter banks within a deep neural network framework,” in Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 2013, pp. 297-302.
 S. Mallat, “Group invariant scattering,” Communications on Pure and Applied Mathematics, vol. 65, no. 10, pp. 1331-1398, 2012.
 J. Ande´n and S. Mallat, “Deep scattering spectrum,” Signal Processing, IEEE Transactions on, vol. 62, no. 16, pp. 4114-4128, 2014.
 T. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, and Y. Ma, “PCANet: A simple deep learning baseline for image classification?,” arXiv preprint arXiv:1404.3606, 2014.
 Y. Xian, Detection and classification of whale vocalizations, Ph.D. thesis, Duke University, 2015.
 Y. Xian, A. Thompson, X. Sun, D. Nowacek, and L. Nolte, “DCTNet and PCANet for acoustic signal feature extraction,” arXiv preprint arXiv:1404.3606, 2016.
 C. Ng and A. Teoh, “DCTNet: A simple learning-free approach for face recognition,” arXiv preprint arXiv: 1507.02049, 2015.