Deep Recurrent Convolutional Neural Network: Improving Performance For Speech Recognition
- Published: 22 Nov 2016
- Sun Yat-sen University China (People's Republic of)
[1] Bahl, Lalit R., Frederick Jelinek, and Robert L. Mercer, “A maximum likelihood approach to continuous speech recognition,” IEEE transactions on Pattern Analysis and Machine Intelligence, 1983, pp. 179-190.
[2] Bahl, L. R., et al., “Maximum mutual information estimation of hidden Markov model parameters for speech recognition,” proc. ICASSP, vol. 86, 1986.
[3] Levinson, S. E., L. R. Rabiner, and M. M. Sondhi, “An introduction to the application of the theory of probabilistic functions of a markov process to automatic speech recognition,” Bell Labs Technical Journal, vol. 62, no. 4, 1983, pp. 1035-1074.
[4] Rabiner, Lawrence R., “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, 1989, pp. 257-286.
[5] Levinson, S. E., L. R. Rabiner, and M. M. Sondhi, “An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition,” Bell System Technical Journal, vol. 62, no. 4, 1983, pp. 1035-1074.
[6] Deng, Li, et al., “Recent advances in deep learning for speech research at Microsoft,” 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 8604-8608.
[7] Dahl, George E., et al., “Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, 2012, pp. 30-42.
[8] Hinton, Geoffrey, et al., “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, 2012, pp. 82-97.
[9] Weng, Chao, et al., “Deep neural networks for single-channel multitalker speech recognition,” 2015 IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 10, 2015, pp. 1670- 1679.
[10] Dahl, G. E., T. N. Sainath, and G. E. Hinton, “Improving deep neural networks for LVCSR using rectified linear units and dropout,” 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 8609-8613.
[11] Miao, Y., and F. Metze. “Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training,” Proc Interspeech, 2013.
[12] Graves Alex, and Navdeep Jaitly, “Towards end-to-end speech recognition with recurrent neural networks,” International Conference of Machine Learning, vol. 14, 2014.
[13] Amodei, Dario, et al., “Deep speech 2: End-to-end speech recognition in english and mandarin,” arXiv preprint arXiv:1512.02595, 2015.
[14] Hannun, Awni, et al., “Deep speech: Scaling up end-to-end speech recognition,” arXiv preprint arXiv:1412.5567, 2014. [OpenAIRE]
[15] Graves Alex, Abdel-rahman Mohamed, and Geoffrey Hinton, “Speech recognition with deep recurrent neural networks” 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013.