Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2021
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2021
License: CC BY
Data sources: Datacite
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2021
License: CC BY
Data sources: ZENODO
versions View all 2 versions
addClaim

GISE-51

Authors: Yadav, Sarthak; Foster, Mary Ellen;
Abstract

GISE-51 is an open dataset of 51 isolated sound events based on the FSD50K dataset. The release also includes the GISE-51-Mixtures subset, a dataset of 5-second soundscapes with up to three sound events synthesized from GISE-51. The GISE-51 release attempts to address some of the shortcomings of recent sound event datasets, providing an open, reproducible benchmark for future research and the freedom to adapt the included isolated sound events for domain-specific applications, which was not possible using existing large-scale weakly labelled datasets. GISE-51 release also included accompanying code for baseline experiments, which can be found at https://github.com/SarthakYadav/GISE-51-pytorch. Citation If you use the GISE-51 dataset and/or the released code, please cite our paper: Sarthak Yadav and Mary Ellen Foster, "GISE-51: A scalable isolated sound events dataset", arXiv:2103.12306, 2021 Since GISE-51 is based on FSD50K, if you use GISE-51 kindly also cite the FSD50K paper: Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, Xavier Serra. "FSD50K: an Open Dataset of Human-Labeled Sound Events", arXiv:2010.00475, 2020. About GISE-51 and GISE-51-Mixtures The following sections summarize key characteristics of the GISE-51 and the GISE-51-Mixtures datasets, including details left out from the paper. GISE-51 Three subsets: train, val and eval with 12465, 1716, and2176 utterances. Subsets are in coherence with the FSD50K release. Encompasses 51 sound classes from the FSD50K release View meta/lbl_map.csv for the complete vocabulary. The dataset was obtained from FSD50K using the following steps: Unsmearing annotations to obtain single instances with a single label using the provided metadata and ground truth in FSD50K. Manual inspection to qualitatively evaluate shortlisted utterances. Volume-threshold based automated silence filtering using sox. Different volume thresholds are selected for various sound event class bins using trial-and-error. silence_thresholds.txt lists class bins and their corresponding volume threshold. Files that were determined by sox to contain no audio at all were manually clipped. Code for performing silence filtering can be found in scripts/strip_silence_sox.py in the code repository. Re-evaluate sound event classes, removing ones with too few samples and merging those with high inter-class ambiguity. GISE-51-Mixtures Synthetic 5-second soundscapes with up to 3 events created using Scaper. Weighted sampling with replacement for sound event selection, effectively oversampling events with very few samples. Synthetic soundscapes generated thus have a near equal number of annotations per sound event. The number of soundscapes in val and eval set is 10000 each. The number of soundscapes in the final train set is 60000. We do provide training sets with 5k-100k soundscapes. GISE-51-Mixtures is our proposed subset that can be used to benchmark the performance of future works. LICENSE All audio clips (i.e., found in isolated_events.tar.gz) used in the preparation of the Glasgow Isolated Events Dataset (GISE-51) are designated Creative Commons and were obtained from FSD50K. The source data in isolated_events.tar.gz is based on the FSD50K dataset, which is licensed as Creative Commons Attribution 4.0 International (CC BY 4.0) License. GISE-51 dataset (including GISE-51-Mixtures) is a curated, processed and generated preparation, and is released under Creative Commons Attribution 4.0 International (CC BY 4.0) License. The license is specified in the LICENSE-DATASET file in license.tar.gz. Baselines Several sound event recognition experiments were conducted, establishing baseline performance on several prominent convolutional neural network architectures. The experiments are described in Section 4 of our paper, and the implementation for reproducing these experiments is available at https://github.com/SarthakYadav/GISE-51-pytorch. Files GISE-51 is available as a collection of several tar archives. All audio files are PCM 16 bit, 22050 Hz. Following lists the contents of these files in detail: isolated_events.tar.gz: The core GISE-51 isolated events dataset containing train, val and eval subfolders. meta.tar.gz: contains lbl_map.json noises.tar.gz: contains background noises used for GISE-51-Mixtures soundscape generation mixtures_jams.tar.gz: This file contains annotation files in .jams format that, alongside isolated_events.tar.gz and noises.tar.gz can be reused to generate exact GISE-51-Mixtures soundscapes. (Optional, we provide the complete set of GISE-51-Mixtures soundscapes as independent tar archives.) train.tar.gz: GISE-51-Mixtures train set, containing 60k synthetic soundscapes. val.tar.gz: GISE-51-Mixtures val set, containing 10k synthetic soundscapes. eval.tar.gz: GISE-51-Mixtures eval set, containing 10k synthetic soundscapes. train_*.tar.gz: These are tar archives containing training mixtures of a various number of soundscapes, used primarily in Section 4.1 of the paper, which compares val mAP performance v/s number of training soundscapes. A helper script is provided in the code release, prepare_mixtures_lmdb.sh, to prepare data for experiments in Section 4.1. pretrained-models.tar.gz: Contains model checkpoints for all experiments conducted in the paper. More information on these checkpoints can be found in the code release README. experiments_60k_mixtures: model checkpoints from section 4.2 of the paper. exported_weights_60k: ResNet-18 and EfficientNet-B1 exported as plain state_dicts for use with transfer learning experiments. experiments_audioset: checkpoints from AudioSet Balanced (Sec 4.3.1) experiments experiments_vggsound: checkpoints from Section 4.3.2 of the paper experiments_esc50: ESC-50 dataset checkpoints, from Section 4.3.3 license.tar.gz: contains dataset license info. silence_thresholds.txt: contains volume thresholds for various sound event bins used for silence filtering. Contact In case of queries and clarifications, feel free to contact Sarthak at s.yadav.2@research.gla.ac.uk. (Adding [GISE-51] to the subject of the email would be appreciated!)

Related Organizations
Keywords

sound event recognition, audio dataset

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
    OpenAIRE UsageCounts
    Usage byUsageCounts
    visibility views 74
    download downloads 90
  • 74
    views
    90
    downloads
    Powered byOpenAIRE UsageCounts
Powered by OpenAIRE graph
Found an issue? Give us feedback
visibility
download
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
views
OpenAIRE UsageCountsViews provided by UsageCounts
downloads
OpenAIRE UsageCountsDownloads provided by UsageCounts
0
Average
Average
Average
74
90
Related to Research communities