
This paper explores various attack scenarios on a voice anonymization system using embeddings alignment techniques. We use Wasserstein-Procrustes (an algorithm initially designed for unsupervised translation) or Procrustes analysis to match two sets of x-vectors, before and after voice anonymization, to mimic this transformation as a rotation function. We compute the optimal rotation and compare the results of this approximation to the official Voice Privacy Challenge results. We show that a complex system like the baseline of the Voice Privacy Challenge can be approximated by a rotation, estimated using a limited set of x-vectors. This paper studies the space of solutions for voice anonymization within the specific scope of rotations. Rotations being reversible, the proposed method can recover up to 62% of the speaker identities from anonymized embeddings.
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Wasserstein-Procrustes, Computer Science - Cryptography and Security, Procrustes Analysis, Voice Privacy, Computer Science - Sound, Machine Learning (cs.LG), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Automatic Speaker Verification, Cryptography and Security (cs.CR), [INFO.INFO-CR] Computer Science [cs]/Cryptography and Security [cs.CR], Electrical Engineering and Systems Science - Audio and Speech Processing
[INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Wasserstein-Procrustes, Computer Science - Cryptography and Security, Procrustes Analysis, Voice Privacy, Computer Science - Sound, Machine Learning (cs.LG), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Automatic Speaker Verification, Cryptography and Security (cs.CR), [INFO.INFO-CR] Computer Science [cs]/Cryptography and Security [cs.CR], Electrical Engineering and Systems Science - Audio and Speech Processing
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 2 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
