
What does this dataset contain? This dataset comprises nearly 900 million organically selected, time-stamped listening events from 4 million anonymized Deezer users recorded in 2023. It covers 50,000 anonymized songs, among the platform’s most popular, along with their multimodal pre-trained embedding vectors (Audio and SVD) generated by our internal model. All files are provided in Parquet format, readable with the `pandas.read_parquet` function. What could this dataset be used for? This dataset can be applied to multimodal collaborative filtering and multimodal sequential recommendation tasks, including both next-item and next-session prediction. Citation If you use this dataset, please cite following paper: @inproceedings{tran-recsys2025, title={"Beyond the past": Leveraging Audio and Human Memory for Sequential Music Recommendation}, author={Viet-Anh Tran, Bruno Sguerra, Gabriel Meseguer-Brocal, Lea Briand and Manuel Moussallam}, booktitle = {Proceedings of the 19th ACM Conference on Recommender Systems}, year = {2025}}
recommender system, music dataset, audio, sequential recommendation, multimodal, repeat consumption, listening events
recommender system, music dataset, audio, sequential recommendation, multimodal, repeat consumption, listening events
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
