
This paper presents the development of a speech recognition system for the Arabic language that can handle continuous speech and a large number of words, independent of the speaker, using deep neural network models trained by self-supervised learning. The system was built using the HuBERT model, and resulted in a word error rate (WER) of 19.3%. Our study on different data sets revealed that the HuBERT-based system has a significant ability to generalize to different spoken dialects. Additionally, we conducted a statistical analysis on the errors specific to the Arabic language that arise from the HuBERT-based system, which highlighted the necessity of incorporating an error correction language model to enhance system accuracy. After the addition of an Arabic language model, the WER decreased to 10.7%. Overall, this study emphasizes the potential of self-supervised learning-based speech recognition systems for the Arabic language and highlights the importance of incorporating language models to enhance system accuracy.
self-attention, Science, Q, speech recognition, deep learning, supervised learning, Speech Recognition, Deep Learning, Self-attention, Supervised Learning, Self-Supervised Learning., self-supervised learning.
self-attention, Science, Q, speech recognition, deep learning, supervised learning, Speech Recognition, Deep Learning, Self-attention, Supervised Learning, Self-Supervised Learning., self-supervised learning.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
