Retrieval-Augmented Generation (RAG) systems enhance the performance of Large Language Models
(LLMs) by incorporating external information fetched from a retriever component. While traditional
approaches prioritize retrieving “relevant” documents, our research reveals that these documents can
be a double-edged sword. We explore the counterintuitive benefits of integrating noisy, non-relevant
documents into the retrieval process. In particular, we conduct an analysis of how different types of
retrieved documents—relevant, distracting, and random—affect the overall effectiveness of RAG systems.
Our findings reveal that the inclusion of random documents, often perceived as noise, can significantly
improve LLM accuracy, with gains up to 35%. Conversely, highly scored but non-relevant documents
from the retriever negatively impact performance. These insights challenge conventional retrieval
strategies and suggest a paradigm shift towards rethinking information retrieval for neural models.
Dettaglio pubblicazione
2024, Proceedings of the 14th Italian Information Retrieval Workshop (IIR 2024), Pages 95-98 (volume: 3802)
Rethinking Relevance: How Noise and Distractors Impact Retrieval-Augmented Generation (04b Atto di convegno in volume)
Cuconasu Florin, Trappolini Giovanni, Siciliano Federico, Filice Simone, Campagnano Cesare, Maarek Yoelle, Tonellotto Nicola, Silvestri Fabrizio
keywords