Exploring Latent Spaces of Tonal Music using Variational Autoencoders

descriptionPublicationkeyboard_double_arrow_right Article , Conference object , Preprint , Book 01 Jan 2023Embargo end date: 01 Jan 2023 Portugal Publisher:ZenodoJournal:CoRR, volume abs/2311.03621

Authors: Nádia Carvalho; Gilberto Bernardes;

doi: 10.5281/zenodo.8328558 , 10.5281/zenodo.8328559 , 10.48550/arxiv.2311.03621 , 10.5281/zenodo.8345314

arXiv: 2311.03621

handle: 10216/154771

Exploring Latent Spaces of Tonal Music using Variational Autoencoders

- Summary
- Subjects
- Metrics

Abstract

https://aimc2023.pubpub.org/pub/latent-spaces-tonal-music Variational Autoencoders (VAEs) have proven to be effective models for producing latent representations of cognitive and semantic value. We assess the degree to which VAEs trained on a prototypical tonal music corpus of 371 Bach's chorales define latent spaces representative of the circle of fifths and the hierarchical relation of each key component pitch as drawn in music cognition. In detail, we compare the latent space of different VAE corpus encodings — Piano roll, MIDI, ABC, Tonnetz, DFT of pitch, and pitch class distributions — in providing a pitch space for key relations that align with cognitive distances. We evaluate the model performance of these encodings using objective metrics to capture accuracy, mean square error (MSE), KL- divergence, and computational cost. The ABC encoding performs the best in reconstructing the original data, while the Pitch DFT seems to capture more information from the latent space. Furthermore, an objective evaluation of 12 major or minor transpositions per piece is adopted to quantify the alignment of 1) intra- and inter-segment distances per key and 2) the key distances to cognitive pitch spaces. Our results show that Pitch DFT VAE latent spaces align best with cognitive spaces and provide a common-tone space where overlapping objects within a key are fuzzy clusters, which impose a well-defined order of structural significance or stability — i.e., a tonal hierarchy. Tonal hierarchies of different keys can be used to measure key distances and the relationships of their in-key components at multiple hierarchies (e.g., notes and chords). The implementation of our VAE and the encodings framework are made available online.

Country

Portugal

Related Organizations

Universidade Lusófona do Porto
Portugal
Faculty of Engineering of University of Porto
Portugal

Keywords

FOS: Computer and information sciences, Computer Science - Machine Learning, Sound (cs.SD), Audio and Speech Processing (eess.AS), FOS: Electrical engineering, electronic engineering, information engineering, Computer Science - Sound, Computer Science - Multimedia, Electrical Engineering and Systems Science - Audio and Speech Processing, Machine Learning (cs.LG), Multimedia (cs.MM)

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average