Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2022
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2022
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2022
License: CC BY
Data sources: ZENODO
versions View all 2 versions
addClaim

This Research product is the result of merged Research products in OpenAIRE.

You have already added 0 works in your ORCID record related to the merged Research product.

CFAD: A Chinese Dataset for Fake Audio Detection

Authors: Haoxin Ma; Jiangyan Yi;

CFAD: A Chinese Dataset for Fake Audio Detection

Abstract

Fake audio detection is a growing concern and some relevant datasets have been designed for research. However, there is no standard public Chinese dataset under complex conditions. In this paper, we aim to fill in the gap and design a Chinese fake audio detection dataset (CFAD) for studying more generalized detection methods. Twelve mainstream speech-generation techniques are used to generate fake audio. To simulate the real-life scenarios, three noise datasets are selected for noise adding at five different signal-to-noise ratios, and six codecs are considered for audio transcoding. CFAD dataset can be used not only for fake audio detection but also for detecting the algorithms of fake utterances for audio forensics. Baseline results are presented with analysis. The results that show fake audio detection methods with generalization remain challenging. The CFAD dataset is publicly available on GitHub https://github.com/ADDchallenge/CFAD CFAD dataset considers 12 types of fake audio, 11 of which are generated by different speech synthesis techniques and the remaining one is partially fake type. Partially fake audio is completely different from synthesis speech and thus can better evaluate the generalization of the detection model to unknown types. The real audio is collected from 6 different corpora to increase the diversity of real category distributions, which makes model less prone to artifact from a single database. For robustness evaluation, we additionally simulate background noise and media codecs that might occur in real life and provide detailed labels, including fake type, real source, noise type, signal noise ratio (SNR), and media codecs. Overall, CFAD dataset consists of three different versions, named clean, noisy, and codec versions. Each version of the dataset is divided into disjoint training, development, and test sets in the same way. There is no speaker overlap across these three subsets. Each test set is further divided into seen and unseen test sets. Unseen test sets can evaluate the generalization of the methods to unknown types. It is worth mentioning that both real audio and fake audio in the unseen test set are unknown to the model. For the noisy speech part, we select three noise databases for simulation. Additive noises are added to each audio in the clean dataset at 5 different SNRs. The additive noises of the unseen test set and the remaining subsets come from different noise databases. For the codec speech part, we select six different codecs. Two of them are applied for unseen test set. In each version (clean, noisy, and codec versions) of the CFAD dataset, there are 138400 utterances in training set, 14400 utterances in development set, 42000 utterances in seen test set, and 21000 utterances in unseen test set. Clean Real Audios Collection From the point of eliminating the interference of irrelevant factors, we collect clean real audios from two aspects: 5 open resources from OpenSLR platform (http://www.openslr.org/12/) and one self-recording dataset. Clean Fake Audios Generation We select 11 representative speech synthesis methods to generate the fake audios and one partially fake audios. Noisy Audios Simulation Noisy audios aim to quantify the robustness of the methods under noisy conditions. To simulate the real-life scenarios, we artificially sample the noise signals and add them to clean audios at 5 different SNRs, which are 0dB, 5dB, 10dB, 15dB and 20dB. Additive noises are selected from three noise databases: PNL 100 Nonspeech Sounds, NOISEX-92, and TAU Urban Acoustic Scenes. Audio Transcoding The Codec version aims to quantify the robustness of the methods under different format conversions. We select a total of six codecs. For the training, development, and seen test sets in codec version, mp3, flac, ogg, and m4a are used. For the unseen test set of the codec version, aac, and wma are used. Audio transcoding operation is operated on the audio in the clean version. Each clean audio will be randomly transformed with one of the candidate codecs and converted back to original WAV files using ffmpeg toolkits. This data set is licensed with a CC BY-NC-ND 4.0 license. You can cite the data using the following BibTeX entry.

{"references": ["Jiangyan Yi, Ye Bai, Jianhua Tao, Zhengkun Tian, Chenglong Wang, Tao Wang, and Ruibo Fu. Half-truth: A partially fake audio detection dataset. Interspeech 2021.", "Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, and Hao Zheng. Aishell-1: An open-source mandarin Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA), pages 1\u20135. IEEE, 2017.", "Xin Xu Shaoji Zhang Ming Li Yao Shi, Hui Bu. Aishell-3: A multi-speaker mandarin tts corpus and the baselines. 2015.", "Zhiyong Zhang Dong Wang, Xuewei Zhang. Thchs-30 : A free chinese speech corpus, 2015.", "Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, et al. Open source magicdata-ramc: A rich annotated mandarin conversational (ramc) speech dataset. arXiv preprint arXiv:2203.16844, 2022.", "Guoning Hu and DeLiang Wang. A tandem algorithm for pitch estimation and voiced speech segregation. IEEE Transactions on Audio, Speech, and Language Processing, 18(8):2067\u20132079,2010.", "Andrew Varga and Herman JM Steeneken. Assessment for automatic speech recognition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech communication, 12(3):247\u2013251, 1993.", "Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen. A multi-device dataset for urban acoustic scene classification. In Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018), pages 9\u201313, November 2018."]}

Related Organizations
Keywords

Deepfake, Fake Audio, Fake Audio Detection, Dataset

  • BIP!
    Impact byBIP!
    citations
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 120
    download downloads 130
  • 120
    views
    130
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
citations
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
120
130