
Biodenoising_validation is a benchmark dataset for animal vocalization denoising. It contains 62 pairs of clean animal vocalizations and noise excerpts. We list the data sources in the clean.csv and noise.csv files. The dataset is created at two sample rates: 16000 and 44100. Each subfolder contains the clean, noise, and noisy subfolders with the accompanying metadata related to the data sources. MethodologyWe programatically create mixtures by pairing vocalizations of noise at random Signal-to-Noise Ratios (SNR) from an uniform distribution between -5 and 10 dB (2.8 average SNR). To ensure reproducibility, we start with a fixed seed that controls the SNR of the mixtures. The samples are between 1 to 60 seconds long (20.14 seconds on average). We split the vocalizations and noises into two lists: underwater (11 vocalizations and 26 noises) and terrestrial (51 vocalizations and 20 noises). For each separate case, we sort the vocalizations and the noise samples and pair them in the order of their duration e.g. matching the longest calls with longest noises. CitationMiron, Marius, Sara Keen, Jen-Yu Liu, Benjamin Hoffman, Masato Hagiwara, Olivier Pietquin, Felix Effenberger, Maddie Cusimano, "Biodenoising: animal vocalization denoising without access to clean data," LicenseThis dataset is provided for educational purposes only and the material contained in them should not be used for any commercial purpose without the express permission of the copyright holders. Contact info@mariusmiron.com
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
