publication . Preprint . 2018

Open Source Automatic Speech Recognition for German

Milde, Benjamin; Köhn, Arne;
Open Access English
  • Published: 26 Jul 2018
Abstract
High quality Automatic Speech Recognition (ASR) is a prerequisite for speech-based applications and research. While state-of-the-art ASR software is freely available, the language dependent acoustic models are lacking for languages other than English, due to the limited amount of freely available training data. We train acoustic models for German with Kaldi on two datasets, which are both distributed under a Creative Commons license. The resulting model is freely redistributable, lowering the cost of entry for German ASR. The models are trained on a total of 412 hours of German read speech data and we achieve a relative word error reduction of 26% by adding data...
Subjects
free text keywords: Computer Science - Computation and Language
Download from
29 references, page 1 of 2

[1] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, et al., “The Kaldi speech recognition toolkit,” in Proc. ASRU, (Atlanta, USA), 2011.

[2] A. Rousseau, P. Deléglise, and Y. Estève, “Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks,” in Proc. LREC, (Reykjavik, Iceland), pp. 3935-3939, 2014. [OpenAIRE]

[3] F. Hernandez, V. Nguyen, S. Ghannay, N. Tomashenko, and Y. Estève, “TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation,” arXiv preprint arXiv:1805.04699, 2018.

[4] V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: an ASR corpus based on public domain audio books,” in Proc. ICASSP, (Brisbane, Australia), pp. 5206- 5210, 2015.

[5] K. J. Han, A. Chandrashekaran, J. Kim, and I. Lane, “The CAPIO 2017 conversational speech recognition system,” arXiv preprint arXiv:1801.00059, 2017. [OpenAIRE]

[6] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, “DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1,” NASA STI/Recon technical report n, vol. 93, 1993.

[7] J. J. Godfrey, E. C. Holliman, and J. McDaniel, “SWITCHBOARD: Telephone speech corpus for research and development,” in Proc. ICASSP, (San Francisco, CA, USA), pp. 517- 520, 1992. [OpenAIRE]

[8] C. Cieri, D. Miller, and K. Walker, “The Fisher corpus: a resource for the next generations of speech-to-text.,” in LREC, vol. 4, (Lisbon, Portugal), pp. 69-71, 2004. [OpenAIRE]

[9] G. Saon, G. Kurata, T. Sercu, K. Audhkhasi, S. Thomas, D. Dimitriadis, X. Cui, B. Ramabhadran, M. Picheny, L.- L. Lim, B. Roomi, and P. Hall, “English conversational telephone speech recognition by humans and machines,” in Proc. Interspeech 2017, (Stockholm, Sweden), pp. 132-136, 2017. [OpenAIRE]

[10] Y. Oualil, D. Klakow, G. Szaszák, A. Srinivasamurthy, H. Helmke, and P. Motlicek, “A context-aware speech recognition and understanding system for air traffic control domain,” in Proc. ASRU, (Okinawa, Japan), pp. 404-408, 2017. [OpenAIRE]

[11] B. Milde, J. Wacker, S. Radomski, M. Mühlhäuser, and C. Biemann, “Ambient search: A document retrieval system for speech streams,” in Proc. COLING 2016, (Osaka, Japan), pp. 2082-2091, 2016.

[12] T. Baumann, A. Köhn, and F. Hennig, “The spoken Wikipedia corpus collection: Harvesting, alignment and an application to hyperlistening,” Language Resources and Evaluation, Jan 2018.

[13] P. Lamere, P. Kwok, W. Walker, E. Gouvea, R. Singh, and P. Wolf, “Design of the CMU Sphinx-4 decoder,” in Proceedings of Eurospeech, (Geneva, Switzerland), pp. 1181-1184, 2003.

[14] S. Radeck-Arneth, B. Milde, A. Lange, E. Gouvêa, S. Radomski, M. Mühlhäuser, and C. Biemann, “Open source german distant speech recognition: Corpus and acoustic model,” in Proc. Text, Speech, and Dialogue (TSD), (Pilsen, Czech Republic), pp. 480-488, 2015. [OpenAIRE]

[15] M. Schröder and J. Trouvain, “The german text-to-speech synthesis system MARY: A tool for research, development and teaching,” International Journal of Speech Technology, vol. 6, no. 4, pp. 365-377, 2003.

29 references, page 1 of 2
Abstract
High quality Automatic Speech Recognition (ASR) is a prerequisite for speech-based applications and research. While state-of-the-art ASR software is freely available, the language dependent acoustic models are lacking for languages other than English, due to the limited amount of freely available training data. We train acoustic models for German with Kaldi on two datasets, which are both distributed under a Creative Commons license. The resulting model is freely redistributable, lowering the cost of entry for German ASR. The models are trained on a total of 412 hours of German read speech data and we achieve a relative word error reduction of 26% by adding data...
Subjects
free text keywords: Computer Science - Computation and Language
Download from
29 references, page 1 of 2

[1] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, et al., “The Kaldi speech recognition toolkit,” in Proc. ASRU, (Atlanta, USA), 2011.

[2] A. Rousseau, P. Deléglise, and Y. Estève, “Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks,” in Proc. LREC, (Reykjavik, Iceland), pp. 3935-3939, 2014. [OpenAIRE]

[3] F. Hernandez, V. Nguyen, S. Ghannay, N. Tomashenko, and Y. Estève, “TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation,” arXiv preprint arXiv:1805.04699, 2018.

[4] V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: an ASR corpus based on public domain audio books,” in Proc. ICASSP, (Brisbane, Australia), pp. 5206- 5210, 2015.

[5] K. J. Han, A. Chandrashekaran, J. Kim, and I. Lane, “The CAPIO 2017 conversational speech recognition system,” arXiv preprint arXiv:1801.00059, 2017. [OpenAIRE]

[6] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, “DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1,” NASA STI/Recon technical report n, vol. 93, 1993.

[7] J. J. Godfrey, E. C. Holliman, and J. McDaniel, “SWITCHBOARD: Telephone speech corpus for research and development,” in Proc. ICASSP, (San Francisco, CA, USA), pp. 517- 520, 1992. [OpenAIRE]

[8] C. Cieri, D. Miller, and K. Walker, “The Fisher corpus: a resource for the next generations of speech-to-text.,” in LREC, vol. 4, (Lisbon, Portugal), pp. 69-71, 2004. [OpenAIRE]

[9] G. Saon, G. Kurata, T. Sercu, K. Audhkhasi, S. Thomas, D. Dimitriadis, X. Cui, B. Ramabhadran, M. Picheny, L.- L. Lim, B. Roomi, and P. Hall, “English conversational telephone speech recognition by humans and machines,” in Proc. Interspeech 2017, (Stockholm, Sweden), pp. 132-136, 2017. [OpenAIRE]

[10] Y. Oualil, D. Klakow, G. Szaszák, A. Srinivasamurthy, H. Helmke, and P. Motlicek, “A context-aware speech recognition and understanding system for air traffic control domain,” in Proc. ASRU, (Okinawa, Japan), pp. 404-408, 2017. [OpenAIRE]

[11] B. Milde, J. Wacker, S. Radomski, M. Mühlhäuser, and C. Biemann, “Ambient search: A document retrieval system for speech streams,” in Proc. COLING 2016, (Osaka, Japan), pp. 2082-2091, 2016.

[12] T. Baumann, A. Köhn, and F. Hennig, “The spoken Wikipedia corpus collection: Harvesting, alignment and an application to hyperlistening,” Language Resources and Evaluation, Jan 2018.

[13] P. Lamere, P. Kwok, W. Walker, E. Gouvea, R. Singh, and P. Wolf, “Design of the CMU Sphinx-4 decoder,” in Proceedings of Eurospeech, (Geneva, Switzerland), pp. 1181-1184, 2003.

[14] S. Radeck-Arneth, B. Milde, A. Lange, E. Gouvêa, S. Radomski, M. Mühlhäuser, and C. Biemann, “Open source german distant speech recognition: Corpus and acoustic model,” in Proc. Text, Speech, and Dialogue (TSD), (Pilsen, Czech Republic), pp. 480-488, 2015. [OpenAIRE]

[15] M. Schröder and J. Trouvain, “The german text-to-speech synthesis system MARY: A tool for research, development and teaching,” International Journal of Speech Technology, vol. 6, no. 4, pp. 365-377, 2003.

29 references, page 1 of 2
Powered by OpenAIRE Open Research Graph
Any information missing or wrong?Report an Issue