A Benchmarking on Cloud based Speech-To-Text Services for French Speech and Background Noise Effect

descriptionPublicationkeyboard_double_arrow_right Article , Conference object , Preprint 01 Jan 2021Embargo end date: 01 Jan 2021Publisher:ZenodoJournal:CoRR, volume abs/2105.03409

Authors: Binbin Xu 0004; Chongyang Tao; Zidu Feng; Youssef Raqui; Sylvie Ranwez;

doi: 10.48550/arxiv.2105.03409 , 10.5281/zenodo.8117300 , 10.5281/zenodo.8117301

arXiv: 2105.03409

A Benchmarking on Cloud based Speech-To-Text Services for French Speech and Background Noise Effect

- Summary
- Subjects
- Metrics

Abstract

This study presents a large scale benchmarking on cloud based Speech-To-Text systems: {Google Cloud Speech-To-Text}, {Microsoft Azure Cognitive Services}, {Amazon Transcribe}, {IBM Watson Speech to Text}. For each systems, 40158 clean and noisy speech files about 101 hours are tested. Effect of background noise on STT quality is also evaluated with 5 different Signal-to-noise ratios from 40dB to 0dB. Results showed that {Microsoft Azure} provided lowest transcription error rate $9.09\%$ on clean speech, with high robustness to noisy environment. {Google Cloud} and {Amazon Transcribe} gave similar performance, but the latter is very limited for time-constraint usage. Though {IBM Watson} could work correctly in quiet conditions, it is highly sensible to noisy speech which could strongly limit its application in real life situations.

6th National Conference on Practical Applications of Artificial Intelligence, 2021, Bordeaux, France

Related Organizations

University of Montpellier
France
EuroMov - Digital Health in Motion
France

Keywords

FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Speech-To-Text, Benchmarking, French language, Google Cloud, Microsoft Azure Cognitive Services, Amazon Transcribe, IBM Watson, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Speech-To-Text, Benchmarking, French language, Google Cloud, Microsoft Azure Cognitive Services, Amazon Transcribe, IBM Watson, Computation and Language (cs.CL), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average