Created by Yu Wang, Mark Cartwright, and Juan Pablo Bello Publication If using this data in academic work, please cite the following paper, which presented this dataset: Y. Wang, M. Cartwright, and J. P. Bello. "Active Few-Shot Learning for Sound Event Detection", INTERSPEECH, 2022 Description SONYC-FSD-SED is an open dataset of programmatically mixed audio clips that simulates audio data in an environmental sound monitoring system, where sound class occurrences and co-occurrences exhibit seasonal periodic patterns. We use recordings collected from the Sound of New York City (SONYC) acoustic sensor network as backgrounds, and single-labeled clips in the FSD50K dataset as foreground events to generate 576,591 10-second strongly-labeled soundscapes with Scaper (including 111,294 additional test data for the experiment of sampling window). Instead of sampling foreground sound events uniformly, we simulate the occurrence probability of each class at different times in a year, creating more realistic temporal characteristics. Source material and annotations Due to the large size of the dataset, instead of releasing the raw audio files, we release the source material and soundscape annotations in JAMS format, which can be used to reproduce SONYC-FSD-SED using Scaper with the script in the project repository. Background material from SONYC recordings We pick a sensor from the SONYC sensor network and subsample from recordings it collected within a year (2017). We categorize these ∼550k 10-second clips into 96 bins based on timestamps, where each bin represents a unique combination of the month of a year, day of a week (weekday or weekend), and time of a day (divided into four 6-hour blocks). Next, we run a pre-trained urban sound event classifier over all recordings and filter out clips with active sound classes. We do not filter out footstep and bird since they appear too frequently, instead, we remove these two classes from the foreground sound material. Then from each bin, we choose the clip with the lowest sound pressure level, yielding 96 background clips. Foreground material from FSD50K We follow the same filtering process as in FSD-MIX-SED to get the subset of FSD50K with short single-labeled clips. In addition, we remove two classes, "Chirp_and_tweet" and "Walk_and_footsteps", that exist in our SONYC background recordings. This results in 87 sound classes. vocab.json contains the list of 87 classes, each class is then labeled by its index in the list. 0-42: train, 43-56: val, 57-86: test. Occurrence probability modelling For each class, we model its occurrence probability within a year. We use von Mises probability density functions to simulate the probability distribution over different weeks in a year and hours in a day considering their cyclic characteristics: \(f(x|μ, κ) = e^{κcos(x−μ)}/2πI_0(κ)\), where \(I_0(κ)\) is the modified Bessel function of order \(0\), \(\mu\) and \(1/\kappa\) are analogous to the mean and variance in the normal distribution. We randomly sample \((\mu_{year}, \mu_{day})\) from \([-\pi, \pi]\) and \((\kappa_{year}, \kappa_{day})\) from \([0, 10]\). We also randomly assign \(p_{weekday} \in [0, 1] \), \(p_{weekend} = 1 − p_{weekday}\) to simulate the probability distribution over different days in a week. Finally, we get the probability distribution over the entire year with a 1-hour resolution. At a given timestamp, we integrate \(f_{year}\) and \(f_{day}\) over the 1-hour window and multiply them together with \(p_{weekday}\) or \(p_{weekend}\) depends on the day. To speed up the following sampling process, we scale the final probability distribution using a temperature parameter randomly sampled from \([2,3]\). Files SONYC_FSD_SED.source.tar.gz: 96 SONYC backgrounds and 10,158 foreground sounds in `.wav` format. The original file size is 2GB. SONYC_FSD_SED.annotations.tar.gz: 465,467 JAMS files. The original file size is 57GB. SONYC_FSD_SED_add_test.annotations.tar.gz: 111,294 JAMS files for additional test data. The original file size is 14GB. vocab.json: 87 classes. occ_prob_per_cl.pkl: Occurrence probability for each foreground sound class. References [1] J. P. Bello, C. T. Silva, O. Nov, R. L. DuBois, A. Arora, J. Salamon, C. Mydlarz, and H. Doraiswamy, “SONYC: A system for monitoring, analyzing, and mitigating urban noise pollution,” Commun. ACM, 2019 [2] E. Fonseca, X. Favory, J. Pons, F. Font, X. Serra. "FSD50K: an Open Dataset of Human-Labeled Sound Events", arXiv:2010.00475, 2020.

Related Organizations

New York University
United States
New Jersey Institute of Technology
United States

Keywords

sound event detection, sound event classification, audio dataset, environmental sound

Impact byBIP!

	citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average