"We share our code online": Why this is not enough to ensure reproducibility and progress in recommender systems research

This record hosts the list of papers and the evaluation data that are part of the RecSys 2025 reproducibility analysis "We share our code online": Why this is not enough to ensure reproducibility and progress in recommender systems research. More information can also be found at the corresponding website. Abstract: Issues with reproducibility have been identified as a major factor hampering progress in recommender systems research. In response, researchers increasingly share the code of their models. However, the provision of only the code of the proposed model is usually not sufficient to ensure reproducibility. In many works, the central claim is that a new model is advancing the state-of-the-art. Thus, it is crucial that the entire experiment is reproducible, including the configuration and the results of the considered baselines. With this work, our goal is to gauge the level of reproducibility in algorithms research in recommender systems. We systematically analyzed the reproducibility level of 65 papers published at a top-ranked conference during the last three years. Our results are sobering. While the model code is shared in about two thirds of the papers, the code of the baselines is provided only in eight cases. The hyperparameters of the baselines are reported even less frequently, and how these were exactly determined is not explained in any paper. As a result, it is commonly not only impossible to reproduce the full result tables reported in the papers, it is also unclear if the claimed improvements over the state-of-the-art were actually achieved. Overall, we conclude that the research community has not reached the required level of reproducibility yet. We therefore call for more rigorous reproducibility standards to ensure progress in this field.

Related Organizations

University of Klagenfurt
Austria
University of Copenhagen
Denmark
TH Köln – University of Applied Sciences
Germany

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average