
Local key estimation (LKE) is an important yet challenging task in music information retrieval since it involves a high level of musical abstraction, which entails ambiguity and low inter-annotator agreement. Relying on limited (small) datasets with a single annotation may introduce not only dataset bias but also annotator bias. To address such problems, we propose in this paper a novel, annotation-free evaluation strategy for LKE. To this end, we exploit datasets where multiple versions of the same musical work are available. We investigate the models' consistency across versions, expecting an effective and robust model to output similar predictions on different versions of the same work. In our experiments, we study the behavior of the proposed cross-version consistency measure at the example of different models and datasets, indicating a strong correlation between cross-version consistency and the models' effectiveness on in-domain data as well as their generalization to out-of-domain data. Our further studies show that, while being correlated to common evaluation metrics, cross-version consistency is also capturing different aspects of model behavior, thus serving as an additional figure of merit for evaluating LKE models.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
