<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

Codecfake dataset - test set (part 1 of 2)

Name: Codecfake dataset - test set (part 1 of 2)
Creator: Xie, Yuankun

Research datakeyboard_double_arrow_right Dataset 10 May 2024Publisher:Zenodo

Authors: Xie, Yuankun;

doi: 10.5281/zenodo.11169780 , 10.5281/zenodo.11169781 , 10.5281/zenodo.13838823

Codecfake dataset - test set (part 1 of 2)

- Summary
- Metrics

Abstract

This dataset is the test set (part 1 of 2) of the Codecfake dataset , corresponding to the manuscript "The Codecfake Dataset and Countermeasures for Universal Deepfake Audio Detection". Abstract With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for effective detection methods. Unlike traditional deepfake audio generation, which often involves multi-step processes culminating in vocoder usage, ALM directly utilizes neural codec methods to decode discrete codes into audio. Moreover, driven by large-scale data, ALMs exhibit remarkable robustness and versatility, posing a significant challenge to current audio deepfake detection (ADD)models. To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method, the conversion from neural codec to waveform. We initially construct the Codecfake dataset, an open-source large-scale dataset, including two languages, millions of audio samples, and various test conditions, tailored for ALM-based audio detection. Additionally, to achieve universal detection of deepfake audio and tackle domain ascent bias issue of original SAM, we proposethe CSAM strategy to learn a domain balanced and generalized minima. Experiment results demonstrate that co-training on Codecfake dataset and vocoded dataset with CSAM strategy yield the lowest average Equal Error Rate (EER) of 0.616% across all test conditions compared to baseline models. Codecfake Dataset Due to platform restrictions on the size of zenodo repositories, we have divided the Codecfake dataset into various subsets as shown in the table below: Codecfake dataset description link training set (part 1 of 3) & label train_split.zip & train_split.z01 - train_split.z05 https://zenodo.org/records/13838106 training set (part 2 of 3) train_split.z06 - train_split.z10 https://zenodo.org/records/13841652 training set (part 3 of 3) train_split.z11 - train_split.z16 https://zenodo.org/records/13853860 development set dev_split.zip & dev_split.z01 - dev_split.z02 https://zenodo.org/records/13841216 test set (part 1 of 2) Codec test: C1.zip - C6.cip & ALM test: A1.zip - A3.zip https://zenodo.org/records/13838823 test set (part 2 of 2) Codec unseen test: C7.zip https://zenodo.org/records/11125029 Countermeasure The source code of the countermeasure and pre-trained model are available on GitHub https://github.com/xieyuankun/Codecfake. The Codecfake dataset and pre-trained model are licensed with CC BY-NC-ND 4.0 license.

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average