Scaling Self-Supervised Representation Learning for Symbolic Piano Performance

descriptionPublicationkeyboard_double_arrow_right Article , Conference object , Preprint 01 Jan 2025Embargo end date: 01 Jan 2025Publisher:ISMIRJournal:CoRR, volume abs/2506.23869

Authors: Louis Bradshaw; Honglu Fan; Alexander Spangher; Stella Biderman; Simon Colton;

doi: 10.5281/zenodo.17706483 , 10.48550/arxiv.2506.23869 , 10.5281/zenodo.17706484 , 10.5281/zenodo.17811409

arXiv: 2506.23869

Scaling Self-Supervised Representation Learning for Symbolic Piano Performance

- Summary
- Subjects
- Metrics

Abstract

We study the capabilities of generative autoregressive transformer models trained on large amounts of symbolic solo-piano transcriptions. After first pretraining on approximately 60,000 hours of music, we use a comparatively smaller, high-quality subset, to finetune models to produce musical continuations, perform symbolic classification tasks, and produce general-purpose contrastive MIDI embeddings by adapting the SimCLR framework to symbolic music. When evaluating piano continuation coherence, our generative model outperforms leading symbolic generation techniques and remains competitive with proprietary audio generation models. On MIR classification benchmarks, frozen representations from our contrastive model achieve state-of-the-art results in linear probe experiments, while direct finetuning demonstrates the generalizability of pretrained representations, often requiring only a few hundred labeled examples to specialize to downstream tasks.

ISMIR (2025)

Keywords

Machine Learning, FOS: Computer and information sciences, Sound (cs.SD), Sound, Artificial Intelligence (cs.AI), Artificial Intelligence, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Audio and Speech Processing, Machine Learning (cs.LG)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Green