Powered by OpenAIRE graph
Found an issue? Give us feedback
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/ ZENODOarrow_drop_down
image/svg+xml art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos Open Access logo, converted into svg, designed by PLoS. This version with transparent background. http://commons.wikimedia.org/wiki/File:Open_Access_logo_PLoS_white.svg art designer at PLoS, modified by Wikipedia users Nina, Beao, JakobVoss, and AnonMoos http://www.plos.org/
ZENODO
Dataset . 2026
License: CC BY
Data sources: ZENODO
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
ZENODO
Dataset . 2026
License: CC BY
Data sources: Datacite
versions View all 2 versions
addClaim

HumMusQA: A Human-written Music Understanding QA Benchmark Dataset

Authors: Weck, Benno; Puentes, Pablo; Poltronieri, Andrea; Prabhu, Satyajeet; Bogdanov, Dmitry;

HumMusQA: A Human-written Music Understanding QA Benchmark Dataset

Abstract

HumMusQA: A Human-Written Music Understanding QA Benchmark Dataset HumMusQA is a benchmark dataset for evaluating music understanding in Large Audio-Language Models (LALMs).It contains 320 human-written multiple-choice questions created and validated by musically trained experts to test perception and interpretation of musical content. This dataset accompanies the paper: Benno Weck, Pablo Puentes, Andrea Poltronieri, Satyajeet Prabhu, and Dmitry Bogdanov. 2026. HumMusQA: A Human-written Music Understanding QA Benchmark Dataset. In Proceedings of the 4th Workshop on NLP for Music and Audio (NLP4MusA 2026), pages 58–67, Rabat, Morocco. Association for Computational Linguistics. Files HumMusQA.csvMain dataset containing all questions. Columns: Song link start time end time Question True answer Distractor 1 Distractor 2 Distractor 3 Main Category Secondary Categories Difficulty metadata.csvTrack metadata and licensing information. Columns: track_id song_link name artist_name album_name license_ccurl audio_excerpts.zipTrimmed audio excerpts corresponding to each question. audio_full.zipFull audio tracks. Licensing Each track follows its respective Creative Commons license, specified in metadata.csv.Users must comply with the license associated with each track. Citation If you use this dataset, please cite: @inproceedings{weck-etal-2026-hummusqa, title = "{H}um{M}us{QA}: A Human-written Music Understanding {QA} Benchmark Dataset", author = "Weck, Benno and Puentes, Pablo and Poltronieri, Andrea and Prabhu, Satyajeet and Bogdanov, Dmitry", editor = "Epure, Elena V. and Oramas, Sergio and Doh, SeungHeon and Ramoneda, Pedro and Kruspe, Anna and Sordo, Mohamed", booktitle = "Proceedings of the 4th Workshop on {NLP} for Music and Audio ({NLP}4{M}us{A} 2026)", month = mar, year = "2026", address = "Rabat, Morocco", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2026.nlp4musa-1.9/", doi = "10.18653/v1/2026.nlp4musa-1.9", pages = "58--67", ISBN = "979-8-89176-369-2", abstract = "The evaluation of music understanding in Large Audio-Language Models (LALMs) requires a rigorously defined benchmark that truly tests whether models can perceive and interpret music, a standard that current data methodologies frequently fail to meet.This paper introduces a meticulously structured approach to music evaluation, proposing a new dataset of 320 hand-written questions curated and validated by experts with musical training, arguing that such focused, manual curation is superior for probing complex audio comprehension.To demonstrate the use of the dataset, we benchmark six state-of-the-art LALMs and additionally test their robustness to uni-modal shortcuts."}

  • BIP!
    Impact byBIP!
    selected citations
    These citations are derived from selected sources.
    This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    0
    popularity
    This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
    Average
    influence
    This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
    Average
    impulse
    This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
    Average
Powered by OpenAIRE graph
Found an issue? Give us feedback
selected citations
These citations are derived from selected sources.
This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Citations provided by BIP!
popularity
This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.
BIP!Popularity provided by BIP!
influence
This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).
BIP!Influence provided by BIP!
impulse
This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.
BIP!Impulse provided by BIP!
0
Average
Average
Average
Related to Research communities