Name: Information Retrieval in the Age of Generative AI: The RGB Model
Keywords: Performance (cs.PF), FOS: Computer and information sciences, Computer Science - Performance, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, AI; modelling; risks; information retrieval; Web answering; Information quality; Large Language Models; Retrieval-Augmented Generation; Automation bias; Stack Exchange, Information Retrieval (cs.IR), Computer Science - Information Retrieval

descriptionPublicationkeyboard_double_arrow_right Article , Preprint , Conference object 13 Jul 2025Embargo end date: 01 Jan 2025Publisher:ACMJournal:Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval

Authors: Michele Garetto; Alessandro Cornacchia; Franco Galante; Emilio Leonardi; Alessandro Nordio; Alberto Tarable;

doi: 10.1145/3726302.3730008 , 10.48550/arxiv.2504.20610

arXiv: http://arxiv.org/abs/2504.20610

handle: 11583/2999757

Information Retrieval in the Age of Generative AI: The RGB Model

- Summary
- Subjects
- Metrics

Abstract

The advent of Large Language Models (LLMs) and generative AI is fundamentally transforming information retrieval and processing on the Internet, bringing both great potential and significant concerns regarding content authenticity and reliability. This paper presents a novel quantitative approach to shed light on the complex information dynamics arising from the growing use of generative AI tools. Despite their significant impact on the digital ecosystem, these dynamics remain largely uncharted and poorly understood. We propose a stochastic model to characterize the generation, indexing, and dissemination of information in response to new topics. This scenario particularly challenges current LLMs, which often rely on real-time Retrieval-Augmented Generation (RAG) techniques to overcome their static knowledge limitations. Our findings suggest that the rapid pace of generative AI adoption, combined with increasing user reliance, can outpace human verification, escalating the risk of inaccurate information proliferation across digital resources. An in-depth analysis of Stack Exchange data confirms that high-quality answers inevitably require substantial time and human effort to emerge. This underscores the considerable risks associated with generating persuasive text in response to new questions and highlights the critical need for responsible development and deployment of future generative AI tools.

To be presented at ACM SIGIR 25

Related Organizations

Polytechnic University of Turin
Italy
University of Turin
Italy

Keywords

Performance (cs.PF), FOS: Computer and information sciences, Computer Science - Performance, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, AI; modelling; risks; information retrieval; Web answering; Information quality; Large Language Models; Retrieval-Augmented Generation; Automation bias; Stack Exchange, Information Retrieval (cs.IR), Computer Science - Information Retrieval

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

Average

Green