Downloads provided by UsageCounts
doi: 10.1109/wifs55849.2022.9975428 , 10.5281/zenodo.8139409 , 10.5281/zenodo.8139408 , 10.48550/arxiv.2209.14098
arXiv: 2209.14098
handle: 11588/926570
doi: 10.1109/wifs55849.2022.9975428 , 10.5281/zenodo.8139409 , 10.5281/zenodo.8139408 , 10.48550/arxiv.2209.14098
arXiv: 2209.14098
handle: 11588/926570
Thanks to recent advances in deep learning, sophisticated generation tools exist, nowadays, that produce extremely realistic synthetic speech. However, malicious uses of such tools are possible and likely, posing a serious threat to our society. Hence, synthetic voice detection has become a pressing research topic, and a large variety of detection methods have been recently proposed. Unfortunately, they hardly generalize to synthetic audios generated by tools never seen in the training phase, which makes them unfit to face real-world scenarios. In this work we aim at overcoming this issue by proposing a new detection approach that leverages only the biometric characteristics of the speaker, with no reference to specific manipulations. Since the detector is trained only on real data, generalization is automatically ensured. The proposed approach can be implemented based on off-the-shelf speaker verification tools. We test several such solutions on three popular test sets, obtaining good performance, high generalization ability and high robustness to audio impairment.
FOS: Computer and information sciences, Sound (cs.SD), Audio and Speech Processing (eess.AS), Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
FOS: Computer and information sciences, Sound (cs.SD), Audio and Speech Processing (eess.AS), Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 24 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Top 10% | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Top 10% |
| views | 42 | |
| downloads | 28 |

Views provided by UsageCounts
Downloads provided by UsageCounts