
<h2>MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models</h2> <p>MuChoMusic is a benchmark designed to evaluate music understanding in multimodal language models focused on audio. It includes 1,187 multiple-choice questions validated by human annotators, based on 644 music tracks from two publicly available music datasets. These questions cover a wide variety of genres and assess knowledge and reasoning across several musical concepts and their cultural and functional contexts. The benchmark provides a holistic evaluation of five open-source models, revealing challenges such as over-reliance on the language modality and highlighting the need for better multimodal integration.</p> <h3>Note on Audio Files</h3> <p>This dataset comes without audio files. The audio files can be downloaded from two datasets: <a href="https://doi.org/10.5281/zenodo.10072001" target="_new" rel="noreferrer">SongDescriberDataset (SDD)</a> and <a href="https://www.kaggle.com/datasets/googleai/musiccaps" target="_new" rel="noreferrer">MusicCaps</a>. Please see the <a href="https://github.com/mulab-mir/muchomusic" target="_new" rel="noreferrer">code repository</a> for more information on how to download the audio.</p> <h3>Citation</h3> <p>If you use this dataset, please cite our <a href="https://arxiv.org/abs/2408.01337" target="_blank" rel="noopener">paper</a>:</p> <pre><code>@inproceedings{weck2024muchomusic, title={MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models}, author={Weck, Benno and Manco, Ilaria and Benetos, Emmanouil and Quinton, Elio and Fazekas, György and Bogdanov, Dmitry}, booktitle = {Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR)}, year={2024} }</code></pre> Weck B, Manco I, Benetos E, Quinton E, Fazekas G, Bogdanov D. MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models. In: Kaneshiro B, Mysore G, Nieto O, Donahue C, Huang CZA, Lee JH, McFee B, McCallum M, editors. Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR2024); 2024 November 10-14; San Francisco, USA.
Audio-language, Computer and Information Science, Multimodal, Multiple-choice, Benchmark, Music
Audio-language, Computer and Information Science, Multimodal, Multiple-choice, Benchmark, Music
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
