WaveFake: A data set to facilitate audio DeepFake detection

The main purpose of this data set is to facilitate research into audio DeepFakes. These generated media files have been increasingly used to commit impersonation attempts or online harassment. The data set consists of 88,600 generated audio clips (16-bit PCM wav). All of these samples were generated by four different neural network architectures: MelGAN Parallel WaveGAN Multi-Band MelGAN WaveGlow Additionally, we examined a bigger version of MelGAN and investigated a variant of Multi-Band MelGAN that computes its auxiliary loss over the full audio instead of its subbands. Collection Process For WaveGlow, we utilize the official implementation (commit 8afb643) in conjunction with the official pre-trained network on PyTorch Hub. We use a popular implementation available on GitHub (commit 12c677e) for the remaining networks. The repository also offers pre-trained models. We used the pre-trained networks to generate samples that are similar to their respective training distributions, LJ Speech and JSUT. When sampling the data set, we first extract Mel spectrograms from the original audio files, using the pre-processing scripts of the corresponding repositories. We then feed these Mel spectrograms to the respective models to obtain the data set. This data set is licensed with a CC-BY-SA 4.0 license. This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy -- EXC-2092 CaSa -- 390781972.

{"references": ["Kumar, Kundan, et al. \"Melgan: Generative adversarial networks for conditional waveform synthesis.\" arXiv preprint arXiv:1910.06711 (2019).", "Yamamoto, Ryuichi, Eunwoo Song, and Jae-Min Kim. \"Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram.\" ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020.", "Yang, Geng, et al. \"Multi-band MelGAN: Faster waveform generation for high-quality text-to-speech.\" 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2021.", "Prenger, Ryan, Rafael Valle, and Bryan Catanzaro. \"Waveglow: A flow-based generative network for speech synthesis.\" ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019.", "Sonobe, Ryosuke, Shinnosuke Takamichi, and Hiroshi Saruwatari. \"JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis.\" arXiv preprint arXiv:1711.00354 (2017)."]}

Related Organizations

Ruhr University Bochum
Germany

Keywords

Deepfake, Machine Learning, DeepFake, Audio, Signal Processing

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	2K
download	downloads	3K

2K
views
3K
downloads
Powered by

Found an issue? Give us feedback

visibility

download

0

Average

2K

3K