publication . Preprint . 2016

Achieving Human Parity in Conversational Speech Recognition

Xiong, W.; Droppo, J.; Huang, X.; Seide, F.; Seltzer, M.; Stolcke, A.; Yu, D.; Zweig, G.;
Open Access English
  • Published: 17 Oct 2016
Comment: Revised for publication, updated results
free text keywords: Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
Download from
68 references, page 1 of 5

[1] M. Campbell, A. J. Hoane, and F.-h. Hsu, “Deep Blue”, Artificial intelligence, vol. 134, pp. 57-83, 2002.

[2] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Mastering the game of Go with deep neural networks and tree search”, Nature, vol. 529, pp. 484-489, 2016.

[3] D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos, et al., “Deep Speech 2: End-toend speech recognition in English and Mandarin”, arXiv preprint arXiv:1512.02595, 2015. [OpenAIRE]

[4] T. T. Kristjansson, J. R. Hershey, P. A. Olsen, S. J. Rennie, and R. A. Gopinath, “Super-human multi-talker speech recognition: the IBM 2006 Speech Separation Challenge system”, in Proc. Interspeech, vol. 12, p. 155, 2006.

[5] C. Weng, D. Yu, M. L. Seltzer, and J. Droppo, “Singlechannel mixed speech recognition using deep neural networks”, in Proc. IEEE ICASSP, pp. 5632-5636. IEEE, 2014. [OpenAIRE]

[6] D. S. Pallett, “A look at NIST's benchmark ASR tests: past, present, and future”, in IEEE Automatic Speech Recognition and Understanding Workshop, pp. 483- 488. IEEE, 2003.

[7] P. Price, W. M. Fisher, J. Bernstein, and D. S. Pallett, “The DARPA 1000-word resource management database for continuous speech recognition”, in Proc. IEEE ICASSP, pp. 651-654. IEEE, 1988.

[8] D. B. Paul and J. M. Baker, “The design for the wall street journal-based csr corpus”, in Proceedings of the workshop on Speech and Natural Language, pp. 357- 362. Association for Computational Linguistics, 1992.

[9] D. Graff, Z. Wu, R. MacIntyre, and M. Liberman, “The 1996 broadcast news speech and language-model corpus”, in Proceedings of the DARPA Workshop on Spoken Language technology, pp. 11-14, 1997.

[10] J. J. Godfrey, E. C. Holliman, and J. McDaniel, “Switchboard: Telephone speech corpus for research and development”, in Proc. IEEE ICASSP, vol. 1, pp. 517-520. IEEE, 1992.

[11] C. Cieri, D. Miller, and K. Walker, “The Fisher corpus: a resource for the next generations of speech-to-text”, in LREC, vol. 4, pp. 69-71, 2004. [OpenAIRE]

[12] S. F. Chen, B. Kingsbury, L. Mangu, D. Povey, G. Saon, H. Soltau, and G. Zweig, “Advances in speech transcription at IBM under the DARPA EARS program”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, pp. 1596-1608, 2006.

[13] S. Matsoukas, J.-L. Gauvain, G. Adda, T. Colthurst, C.- L. Kao, O. Kimball, L. Lamel, F. Lefevre, J. Z. Ma, J. Makhoul, et al., “Advances in transcription of broadcast news and conversational telephone speech within the combined ears bbn/limsi system”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, pp. 1541-1556, 2006.

[14] A. Stolcke, B. Chen, H. Franco, V. R. R. Gadde, M. Graciarena, M.-Y. Hwang, K. Kirchhoff, A. Mandal, N. Morgan, X. Lei, et al., “Recent innovations in speech-to-text transcription at SRI-ICSI-UW”, IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, pp. 1729-1744, 2006.

[15] A. Ljolje, “The AT&T 2001 LVCSR system”, NIST LVCSR Workshop, 2001.

68 references, page 1 of 5
Any information missing or wrong?Report an Issue