Can LLMs "Reason" in Music? an Evaluation of LLMs' Capability of Music Understanding and Generation

descriptionPublicationkeyboard_double_arrow_right Article , Conference object , Preprint 01 Jan 2024Embargo end date: 01 Jan 2024Publisher:ISMIRJournal:CoRR, volume abs/2407.21531

Authors: Ziya Zhou; Yuhang Wu; Zhiyue Wu; Xinyue Zhang; Ruibin Yuan; Yinghao Ma; Lu Wang; +3 Authors

doi: 10.5281/zenodo.14876989 , 10.48550/arxiv.2407.21531 , 10.5281/zenodo.14877281 , 10.5281/zenodo.14876983 , 10.5281/zenodo.14877277

arXiv: 2407.21531

Can LLMs "Reason" in Music? an Evaluation of LLMs' Capability of Music Understanding and Generation

- Summary
- Subjects
- Metrics

Abstract

Symbolic Music, akin to language, can be encoded in discrete symbols. Recent research has extended the application of large language models (LLMs) such as GPT-4 and Llama2 to the symbolic music domain including understanding and generation. Yet scant research explores the details of how these LLMs perform on advanced music understanding and conditioned generation, especially from the multi-step reasoning perspective, which is a critical aspect in the conditioned, editable, and interactive human-computer co-creation process. This study conducts a thorough investigation of LLMs' capability and limitations in symbolic music processing. We identify that current LLMs exhibit poor performance in song-level multi-step music reasoning, and typically fail to leverage learned music knowledge when addressing complex musical tasks. An analysis of LLMs' responses highlights distinctly their pros and cons. Our findings suggest achieving advanced musical capability is not intrinsically obtained by LLMs, and future research should focus more on bridging the gap between music knowledge and reasoning, to improve the co-creation experience for musicians.

Accepted by ISMIR2024

Related Organizations

Hong Kong University of Science and Technology (香港科技大學)
China (People's Republic of)
Shenzhen University
China (People's Republic of)
Hong Kong University of Science and Technology
Hong Kong
Hong Kong University of Science and Technology
China (People's Republic of)
Shenzhen University (SZU)
China (People's Republic of)

View all View all

Keywords

FOS: Computer and information sciences, Sound (cs.SD), Computer Science - Computation and Language, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computation and Language (cs.CL), Computer Science - Sound, Computer Science - Multimedia, Electrical Engineering and Systems Science - Audio and Speech Processing, Multimedia (cs.MM)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green

Beta

SDGs Suggest

4. Education

Beta

SDGs:

4. Education,

Related to Research communities

Knowmad Institut