Downloads provided by UsageCounts
This study presents a large scale benchmarking on cloud based Speech-To-Text systems: {Google Cloud Speech-To-Text}, {Microsoft Azure Cognitive Services}, {Amazon Transcribe}, {IBM Watson Speech to Text}. For each systems, 40158 clean and noisy speech files about 101 hours are tested. Effect of background noise on STT quality is also evaluated with 5 different Signal-to-noise ratios from 40dB to 0dB. Results showed that {Microsoft Azure} provided lowest transcription error rate $9.09\%$ on clean speech, with high robustness to noisy environment. {Google Cloud} and {Amazon Transcribe} gave similar performance, but the latter is very limited for time-constraint usage. Though {IBM Watson} could work correctly in quiet conditions, it is highly sensible to noisy speech which could strongly limit its application in real life situations.
6th National Conference on Practical Applications of Artificial Intelligence, 2021, Bordeaux, France
FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Speech-To-Text, Benchmarking, French language, Google Cloud, Microsoft Azure Cognitive Services, Amazon Transcribe, IBM Watson, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Speech-To-Text, Benchmarking, French language, Google Cloud, Microsoft Azure Cognitive Services, Amazon Transcribe, IBM Watson, Computation and Language (cs.CL), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Speech-To-Text, Benchmarking, French language, Google Cloud, Microsoft Azure Cognitive Services, Amazon Transcribe, IBM Watson, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Speech-To-Text, Benchmarking, French language, Google Cloud, Microsoft Azure Cognitive Services, Amazon Transcribe, IBM Watson, Computation and Language (cs.CL), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
| views | 7 | |
| downloads | 8 |

Views provided by UsageCounts
Downloads provided by UsageCounts