
MGPHot Embeddings and Index Reconstruction Scripts This repository provides precomputed embeddings for the six models described in the original paper https://arxiv.org/pdf/2509.06936 , evaluated on the three benchmark datasets: MGPHot, MagnaTagATune, and MTG-Jamendo. The embeddings are stored in: embeddings_autotagging.zip We also provide a mirror of the scripts to reconstruct the canonical indices with full metadata, mirror_reconstruct.zip.All metadata is already available in the original GitHub repository:https://github.com/MTG/MGPHot-audio Purpose Provide precomputed embeddings for the six models evaluated in the paper. Allow researchers to rebuild the canonical indices locally with the full metadata. Ensure reproducibility while respecting the original dataset licenses. Note: The number of embedding files may vary across models. Some extractors were designed to process the entire dataset, while others only generate embeddings for tracks associated with at least one of the selected tags. How to Reconstruct Run the reconstruction script python reconstruct_index.py to rebuild the canonical indices, verify outputs using the provided .md5 checksums, and print a summary report. License Code is released under the MIT License. Metadata mappings follow the original dataset’s license (CC BY-NC-SA 4.0). Do not upload reconstructed indices or audio anywhere online. Citation If you use these embeddings or scripts in your research, please cite the original paper: @misc{ramoneda2025benchmark, title = {Benchmarking Music Autotagging with MGPHot Expert Annotations vs. Generic Tag Datasets}, author = {Pedro Ramoneda and Pablo Alonso-Jim{\'e}nez and Sergio Oramas and Xavier Serra and Dmitry Bogdanov}, year = {2025}, eprint = {2509.06936}, archivePrefix = {arXiv}, primaryClass = {cs.SD}, url = {https://arxiv.org/abs/2509.06936} } Note on he durability of the benchmark We are open to share specific missing audio files with research institutions to ensure reproducibility
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
