<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>

COPY SCRIPT

For further information contact us at helpdesk@openaire.eu

FAD: A Chinese Dataset for Fake Audio Detection

Name: FAD: A Chinese Dataset for Fake Audio Detection
Keywords: Deepfake, Fake Audio, Fake Audio Detection, Dataset

Research datakeyboard_double_arrow_right Dataset 09 Jun 2022 Chinese Publisher:Zenodo

Authors: Haoxin Ma; Jiangyan Yi;

doi: 10.5281/zenodo.6641573 , 10.5281/zenodo.6623227 , 10.5281/zenodo.6635521

FAD: A Chinese Dataset for Fake Audio Detection

- Summary
- Subjects
- Related research
  (2)
- Metrics

Abstract

Fake audio detection is a growing concern and some relevant datasets have been designed for research. But there is no standard public Chinese dataset under additive noise conditions. In this paper, we aim to fill in the gap and design a Chinese fake audio detection dataset (FAD) for studying more generalized detection methods. Twelve mainstream speech generation techniques are used to generate fake audios. To simulate the real-life scenarios, three noise datasets are selected for noisy adding at five different signal noise ratios. FAD dataset can be used not only for fake audio detection, but also for detecting the algorithms of fake utterances for audio forensics. Baseline results are presented with analysis. The results that show fake audio detection methods with generalization remain challenging. The FAD dataset is publicly available. The source code of baselines is available on GitHub https://github.com/ADDchallenge/FAD The FAD dataset is designed to evaluate the methods of fake audio detection and fake algorithms recognition and other relevant studies. To better study the robustness of the methods under noisy conditions when applied in real life, we construct the corresponding noisy dataset. The total FAD dataset consists of two versions: clean version and noisy version. Both versions are divided into disjoint training, development and test sets in the same way. There is no speaker overlap across these three subsets. Each test sets is further divided into seen and unseen test sets. Unseen test sets can evaluate the generalization of the methods to unknown types. It is worth mentioning that both real audios and fake audios in the unseen test set are unknown to the model. For the noisy speech part, we select three noise database for simulation. Additive noises are added to each audio in the clean dataset at 5 different SNRs. The additive noises of the unseen test set and the remaining subsets come from different noise databases. In each version of FAD dataset, there are 138400 utterances in training set, 14400 utterances in development set, 42000 utterances in seen test set, and 21000 utterances in unseen test set. More detailed statistics are demonstrated in the Tabel 2. Clean Real Audios Collection From the point of eliminating the interference of irrelevant factors, we collect clean real audios from two aspects: 5 open resources from OpenSLR platform (http://www.openslr.org/12/) and one self-recording dataset. Clean Fake Audios Generation We select 11 representative speech synthesis methods to generate the fake audios and one partially fake audios. Noisy Audios Simulation Noisy audios aim to quantify the robustness of the methods under noisy conditions. To simulate the real-life scenarios, we artificially sample the noise signals and add them to clean audios at 5 different SNRs, which are 0dB, 5dB, 10dB, 15dB and 20dB. Additive noises are selected from three noise databases: PNL 100 Nonspeech Sounds, NOISEX-92, and TAU Urban Acoustic Scenes. This data set is licensed with a CC BY-NC-ND 4.0 license. You can cite the data using the following BibTeX entry: @inproceedings{ma2022fad, title={FAD: A Chinese Dataset for Fake Audio Detection}, author={Haoxin Ma, Jiangyan Yi, Chenglong Wang, Xinrui Yan, Jianhua Tao, Tao Wang, Shiming Wang, Le Xu, Ruibo Fu}, booktitle={Submitted to the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks }, year={2022}, }

{"references": ["Jiangyan Yi, Ye Bai, Jianhua Tao, Zhengkun Tian, Chenglong Wang, Tao Wang, and Ruibo Fu. Half-truth: A partially fake audio detection dataset. Interspeech 2021.", "Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, and Hao Zheng. Aishell-1: An open-source mandarin Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA), pages 1\u20135. IEEE, 2017.", "Xin Xu Shaoji Zhang Ming Li Yao Shi, Hui Bu. Aishell-3: A multi-speaker mandarin tts corpus and the baselines. 2015.", "Zhiyong Zhang Dong Wang, Xuewei Zhang. Thchs-30 : A free chinese speech corpus, 2015.", "Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, et al. Open source magicdata-ramc: A rich annotated mandarin conversational (ramc) speech dataset. arXiv preprint arXiv:2203.16844, 2022.", "Guoning Hu and DeLiang Wang. A tandem algorithm for pitch estimation and voiced speech segregation. IEEE Transactions on Audio, Speech, and Language Processing, 18(8):2067\u20132079,2010.", "Andrew Varga and Herman JM Steeneken. Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech communication, 12(3):247\u2013251, 1993.", "Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. A multi-device dataset for urban acoustic scene classification. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), pages 9\u201313, November 2018."]}

Related Organizations

Chinese Academy of Sciences
China (People's Republic of)

Keywords

Deepfake, Fake Audio, Fake Audio Detection, Dataset

Filter by relation

All relations

arrow_drop_down

2 Research products, page 1 of 1

CFAD: A Chinese Dataset for Fake Audio Detection
2022IsAmongTopNSimilarDocuments
CFAD: A Chinese Dataset for Fake Audio Detection
2022IsVersionOf

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average