<script type="text/javascript">
<!--
document.write('<div id="oa_widget"></div>');
document.write('<script type="text/javascript" src="https://www.openaire.eu/index.php?option=com_openaire&view=widget&format=raw&projectId=undefined&type=result"></script>');
-->
</script>
This is the validation set for Task 9, Language-Queried Audio Source Separation (LASS), in DCASE 2024 Challenge. This validation split is meant to be used for Task 9 at the scientific challenge DCASE 2024. This split is not meant to be used for training LASS methods. This split is meant to be used for evaluating LASS methods during the model development stage. This validation set consists of 1000 audio files sourced from Freesound [1], uploaded between April and October 2023. Each audio file has been manually annotated with three captions. In the annotation guidance, we instructed annotators to describe the content of audio clips using 5-20 words (similar to the caption style in Clotho [3] and AudioCaps [4] datasets). The tags of each audio file were verified and revised according to the FSD50K [2] sound event categories. Each audio file has been chunked into a 10-second clip and downsampled to 16kHz. == Details == The audio files in the archives: lass_validation.zip and the associated metadata (including tags and captions) in the JSON file: lass_validation.json Participants will evaluate their LASS models using synthetic mixture data in the development stage. Specifically, given an audio clip A1 and its corresponding caption C, we select an additional audio clip, A2, to serve as background noise, thereby creating a mixed audio, A3. We anticipate that the LASS system, given A3 and C as inputs, will be able to separate the A1 source. We use the revised tags information to ensure that the two audio clips used in each mix do not share overlapping sound source classes. Three thousand synthetic audio mixtures with signal-to-noise ratios (SNR) ranging from -15dB to 15dB will be generated for the validation of LASS model development. These synthetic mixtures can be generated based on the provided CSV file: lass_synthetic_validation.csv The evaluation tool can be found at: https://github.com/Audio-AGI/dcase2024_task9_baseline/blob/main/dcase_evaluator.py == References == [1] Fonseca E, Pons Puig J, Favory X, et al. Freesound datasets: a platform for the creation of open audio datasets. International Society for Music Information Retrieval (ISMIR), 2017. [2] Fonseca E, Favory X, Pons J, et al. FSD50k: an open dataset of human-labeled sound events. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 30: 829-852. [3] Drossos K, Lipping S, Virtanen T. Clotho: An audio captioning dataset. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2020: 736-740. [4] Kim C D, Kim B, Lee H, et al. AudioCaps: Generating captions for audios in the wild. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 2019: 119-132.
citations This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |