Dataset used in COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

Name: Dataset used in COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations
Keywords: audio classification, deep neural network, co-aligned autoencoders, spectrograms, audio representation learning, contrastive loss

Favory, Xavier; Drossos, Konstantinos; Virtanen, Tuomas; Serra, Xavier

Found an issue? Give us feedback

ZENODOarrow_drop_down

ZENODO

Dataset . 2020

Data sources: Datacite

ZENODO

Dataset . 2020

Data sources: ZENODO

ZENODO

Dataset . 2020

Data sources: Datacite

Research.fi

Dataset . 2020

Data sources: Research.fi

Dataset used in COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

Research datakeyboard_double_arrow_right Dataset 09 Jun 2020 Finland Publisher:Zenodo

Authors: Favory, Xavier; Drossos, Konstantinos; Virtanen, Tuomas; Serra, Xavier;

doi: 10.5281/zenodo.3887261 , 10.5281/zenodo.3887260

Dataset used in COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations

- Summary
- Subjects
- Metrics

Abstract

This dataset consists of two hdf5 files that contain pre-computed log-mel spectrograms that have been used to to train audio embedding models. The dataset is split into a training set and a validation set containing respectively 170793 and 19103 spectrogram patches with their accompanying multi-hot encoded tags from a vocabulary of 1000 tags provided by Freesound users. More details can be found in "COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio Representations" by X. Favory, K. Drossos, T. Virtanen, and X. Serra. The code is available at this GitHub repository. License: This dataset is derived from content from the Freesound collection. All sounds are released under Creative Commons (CC) licenses from either CC0, CC-BY, CC-S+, or CC-BY-NC. We attribute authors of all the sounds used in the dataset and provide their corresponding licenses in the attributions.txt file.

Country

Finland

Related Organizations

Tampere University
Finland
Universitat Pompeu Fabra
Spain

Keywords

audio classification, deep neural network, co-aligned autoencoders, spectrograms, audio representation learning, contrastive loss

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	3
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 10%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average