LakhNES: Improving Multi-instrumental Music Generation with Cross-domain Pre-training

descriptionPublicationkeyboard_double_arrow_right Conference object , Article , Preprint 01 Jan 2019Embargo end date: 01 Jan 2019Publisher:ZenodoJournal:CoRR, volume abs/1907.04868

Authors: Chris Donahue; Huanru Henry Mao; Yiting Ethan Li; Garrison W. Cottrell; Julian J. McAuley;

doi: 10.5281/zenodo.3527901 , 10.48550/arxiv.1907.04868 , 10.5281/zenodo.3527902

arXiv: 1907.04868

LakhNES: Improving Multi-instrumental Music Generation with Cross-domain Pre-training

- Summary
- Subjects
- Metrics

Abstract

We are interested in the task of generating multi-instrumental music scores. The Transformer architecture has recently shown great promise for the task of piano score generation; here we adapt it to the multi-instrumental setting. Transformers are complex, high-dimensional language models which are capable of capturing long-term structure in sequence data, but require large amounts of data to fit. Their success on piano score generation is partially explained by the large volumes of symbolic data readily available for that domain. We leverage the recently-introduced NES-MDB dataset of four-instrument scores from an early video game sound synthesis chip (the NES), which we find to be well-suited to training with the Transformer architecture. To further improve the performance of our model, we propose a pre-training technique to leverage the information in a large collection of heterogeneous music, namely the Lakh MIDI dataset. Despite differences between the two corpora, we find that this transfer learning procedure improves both quantitative and qualitative performance for our primary task.

Published as a conference paper at ISMIR 2019

Related Organizations

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Machine Learning (stat.ML), Computer Science - Sound, Machine Learning (cs.LG), Multimedia (cs.MM), Statistics - Machine Learning, Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Multimedia, Electrical Engineering and Systems Science - Audio and Speech Processing

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average