An RFP Dataset for Real, Fake, and Partially Fake Audio Detection

descriptionPublicationkeyboard_double_arrow_right Part of book or chapter of book , Article , Conference object , Preprint 01 Jan 2024Embargo end date: 01 Jan 2024 English Publisher:Springer Nature Singapore

Authors: Abdulazeez AlAli; George Theodorakopoulos;

doi: 10.1007/978-981-97-3973-8_1 , 10.5281/zenodo.18473545 , 10.5281/zenodo.10202142 , 10.48550/arxiv.2404.17721 , 10.5281/zenodo.10202141

arXiv: 2404.17721

An RFP Dataset for Real, Fake, and Partially Fake Audio Detection

- Summary
- Subjects
- Metrics

Abstract

Recent advances in deep learning have enabled the creation of natural-sounding synthesised speech. However, attackers have also utilised these tech-nologies to conduct attacks such as phishing. Numerous public datasets have been created to facilitate the development of effective detection models. How-ever, available datasets contain only entirely fake audio; therefore, detection models may miss attacks that replace a short section of the real audio with fake audio. In recognition of this problem, the current paper presents the RFP da-taset, which comprises five distinct audio types: partial fake (PF), audio with noise, voice conversion (VC), text-to-speech (TTS), and real. The data are then used to evaluate several detection models, revealing that the available detec-tion models incur a markedly higher equal error rate (EER) when detecting PF audio instead of entirely fake audio. The lowest EER recorded was 25.42%. Therefore, we believe that creators of detection models must seriously consid-er using datasets like RFP that include PF and other types of fake audio.

Related Organizations

Cardiff University
United Kingdom

Keywords

FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Cryptography and Security, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Cryptography and Security (cs.CR), Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	5
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 10%

Found an issue? Give us feedback

5

Top 10%

Green