A Critical Study on Data Leakage in Recommender System Offline Evaluation

descriptionPublicationkeyboard_double_arrow_right Article , Preprint 07 Feb 2023Embargo end date: 01 Jan 2020 Singapore English Publisher:Association for Computing Machinery (ACM)Journal:ACM Transactions on Information Systems, volume 41, pages 1-27 (issn: 1046-8188, eissn: 1558-2868,

Copyright policy )

Authors: Yitong Ji; Aixin Sun; Jie Zhang 0002; Chenliang Li 0005;

doi: 10.1145/3569930 , 10.48550/arxiv.2010.11060

arXiv: 2010.11060

handle: 10356/170569

A Critical Study on Data Leakage in Recommender System Offline Evaluation

- Summary
- Subjects
- Metrics

Abstract

Recommender models are hard to evaluate, particularly under offline setting. In this article, we provide a comprehensive and critical analysis of the data leakage issue in recommender system offline evaluation. Data leakage is caused by not observing global timeline in evaluating recommenders e.g., train/test data split does not follow global timeline. As a result, a model learns from the user-item interactions that are not expected to be available at the prediction time. We first show the temporal dynamics of user-item interactions along global timeline, then explain why data leakage exists for collaborative filtering models. Through carefully designed experiments, we show that all models indeed recommend future items that are not available at the time point of a test instance, as the result of data leakage. The experiments are conducted with four widely used baseline models—BPR, NeuMF, SASRec, and LightGCN, on four popular offline datasets—MovieLens-25M, Yelp, Amazon-music, and Amazon-electronic, adopting leave-last-one-out data split. 1 We further show that data leakage does impact models’ recommendation accuracy. Their relative performance orders thus become unpredictable with different amount of leaked future data in training. To evaluate recommendation systems in a realistic manner in offline setting, we propose a timeline scheme, which calls for a revisit of the recommendation model design.

Country

Singapore

Related Organizations

Wuhan University
China (People's Republic of)
Nanyang Technological University
Singapore

Keywords

FOS: Computer and information sciences, 330, Collaborative Filtering, Data Mining, Information Retrieval (cs.IR), 004, Engineering::Computer science and engineering, Computer Science - Information Retrieval

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	63
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Top 1%
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Top 10%
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Top 1%