Deep Recurrent Convolutional Neural Network: Improving Performance For Speech Recognition

Preprint English OPEN
Zhang, Zewang; Sun, Zheng; Liu, Jiaqi; Chen, Jingwen; Huo, Zhao; Zhang, Xiao;
(2016)
  • Subject: Computer Science - Computation and Language | Computer Science - Learning

A deep learning approach has been widely applied in sequence modeling problems. In terms of automatic speech recognition (ASR), its performance has significantly been improved by increasing large speech corpus and deeper neural network. Especially, recurrent neural netw... View more
  • References (45)
    45 references, page 1 of 5

    [1] Bahl, Lalit R., Frederick Jelinek, and Robert L. Mercer, “A maximum likelihood approach to continuous speech recognition,” IEEE transactions on Pattern Analysis and Machine Intelligence, 1983, pp. 179-190.

    [2] Bahl, L. R., et al., “Maximum mutual information estimation of hidden Markov model parameters for speech recognition,” proc. ICASSP, vol. 86, 1986.

    [3] Levinson, S. E., L. R. Rabiner, and M. M. Sondhi, “An introduction to the application of the theory of probabilistic functions of a markov process to automatic speech recognition,” Bell Labs Technical Journal, vol. 62, no. 4, 1983, pp. 1035-1074.

    [4] Rabiner, Lawrence R., “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, 1989, pp. 257-286.

    [5] Levinson, S. E., L. R. Rabiner, and M. M. Sondhi, “An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition,” Bell System Technical Journal, vol. 62, no. 4, 1983, pp. 1035-1074.

    [6] Deng, Li, et al., “Recent advances in deep learning for speech research at Microsoft,” 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 8604-8608.

    [7] Dahl, George E., et al., “Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, 2012, pp. 30-42.

    [8] Hinton, Geoffrey, et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, 2012, pp. 82-97.

    [9] Weng, Chao, et al., “Deep neural networks for single-channel multitalker speech recognition,” 2015 IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 10, 2015, pp. 1670- 1679.

    [10] Dahl, G. E., T. N. Sainath, and G. E. Hinton, “Improving deep neural networks for LVCSR using rectified linear units and dropout,” 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 8609-8613.

  • Metrics
Share - Bookmark