Downloads provided by UsageCounts
arXiv: 2201.09486
Automated speaker recognition uses data processing to identify speakers by their voice. Today, automated speaker recognition is deployed on billions of smart devices and in services such as call centres. Despite their wide-scale deployment and known sources of bias in related domains like face recognition and natural language processing, bias in automated speaker recognition has not been studied systematically. We present an in-depth empirical and analytical study of bias in the machine learning development workflow of speaker verification, a voice biometric and core task in automated speaker recognition. Drawing on an established framework for understanding sources of harm in machine learning, we show that bias exists at every development stage in the well-known VoxCeleb Speaker Recognition Challenge, including data generation, model building, and implementation. Most affected are female speakers and non-US nationalities, who experience significant performance degradation. Leveraging the insights from our findings, we make practical recommendations for mitigating bias in automated speaker recognition, and outline future research directions.
FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), bias, evaluation, Computer Science - Computation and Language, speaker recognition, fairness, audit, Computer Science - Sound, Machine Learning (cs.LG), Computer Science - Computers and Society, Audio and Speech Processing (eess.AS), Computers and Society (cs.CY), FOS: Electrical engineering, electronic engineering, information engineering, speaker verification, Computation and Language (cs.CL), Electrical Engineering and Systems Science - Audio and Speech Processing
FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), bias, evaluation, Computer Science - Computation and Language, speaker recognition, fairness, audit, Computer Science - Sound, Machine Learning (cs.LG), Computer Science - Computers and Society, Audio and Speech Processing (eess.AS), Computers and Society (cs.CY), FOS: Electrical engineering, electronic engineering, information engineering, speaker verification, Computation and Language (cs.CL), Electrical Engineering and Systems Science - Audio and Speech Processing
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 36 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 1% |
| views | 26 | |
| downloads | 25 |

Views provided by UsageCounts
Downloads provided by UsageCounts