RAG-DIVE: A Dynamic Approach for Multi-Turn Dialogue Evaluation in Retrieval-Augmented GenerationFull Paper
This program is tentative and subject to change.
Evaluating Retrieval-Augmented Generation (RAG) systems using static multi-turn datasets fails to capture the dynamic nature of real-world dialogues. Existing evaluation methods rely on predefined datasets, which restrict them to static, one-directional queries and limit their ability to capture the adaptive, context-dependent performance of RAG systems in interactive, multi-turn settings.
Thus, we introduce the RAG-DIVE, a Dynamic Interactive Validation and Evaluation approach, that simulates user interactions with RAG systems. RAG-DIVE leverages an LLM to generate multi-turn conversations dynamically and is organized into three components. The dialogue generation stage consists of the (1) Conversation Generator, which simulates a user by creating multi-turn queries, and the (2) Conversation Validator, which filters and corrects invalid or low-quality outputs to ensure coherent conversations. The evaluation stage is handled by the (3) Conversation Evaluator, which assesses the RAG system’s performance across the entire dialogue and generates both per-turn and multi-turn metrics that provide an aggregated view of system behavior.
We validated RAG-DIVE through two experimental setups. First, we tested a sample RAG system, including human evaluation of dialogue quality, repeated trials to assess consistency, and an ablation study showing that RAG-DIVE detects performance changes caused by system modifications. Second, we compared RAG-DIVE with a traditional static dataset evaluation on an industrial RAG system under different configurations to verify whether both approaches reveal similar performance trends.
Our findings demonstrate that RAG-DIVE facilitates dynamic, interaction-driven evaluation for multi-turn conversations, thereby advancing the assessment of RAG systems.
This program is tentative and subject to change.
Mon 13 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
16:00 - 17:30 | Engineering GenAI SystemsIndustry Track / Research Track / CAIN Program at Oceania X Chair(s): Karthik Vaidhyanathan IIIT Hyderabad | ||
16:00 8mShort-paper | Graphical-Probabilistic Modeling of Generative Flows in LLM-Native Software SystemsShort Paper Research Track | ||
16:08 12mFull-paper | Cognition Envelopes for Bounded AI Reasoning in Autonomous UAS OperationsFull Paper Research Track Pedro Alarcon Granadeno University of Notre Dame, Arturo Miguel Russell Bernal University of Notre Dame, Sofia Nelson University of Notre Dame, Demetrius Hernandez University of Notre Dame, Maureen Petterson University of Notre Dame, Michael Murphy University of Notre Dame, Walter J. Scheirer University of Notre Dame, Jane Cleland-Huang University of Notre Dame Pre-print | ||
16:20 8mIndustry talk | Current challenges and new prospects in software engineering practices for Geospatial AIShort Paper Industry Track | ||
16:28 8mShort-paper | The Physics of AIShort Paper Research Track Scott Barnett Applied Artificial Intelligence Initiative, Deakin University, Aleksandar Pasquini Deakin University, Stefanus Kurniawan Deakin University, Shangeetha Sivasothy Applied Artificial Intelligence Institute, Deakin University, Rhys Hill Deakin University, Rajesh Vasa Deakin University, Australia | ||
16:36 12mFull-paper | RAG-DIVE: A Dynamic Approach for Multi-Turn Dialogue Evaluation in Retrieval-Augmented GenerationFull Paper Research Track Lorenz Brehme University of Innsbruck, Austria, Benedikt Dornauer University of Innsbruck; University of Cologne, Jan-Henrik Böttcher University of Hildesheim, Klaus Schmid , Ruth Breu University of Innsbruck, Mircea-Cristian Racasan c.c.com Moser GmbH, 8074 Grambach, Austria | ||
16:48 8mShort-paper | Assisting Developers in the Selection of Generative AI ModelsShort Paper Research Track Raquel Berenguer Mueller Universitat Oberta de Catalunya, Sergio Cobos IN3 - UOC, Javier Luis Cánovas Izquierdo Universitat Oberta de Catalunya, Robert Clarisó Universitat Oberta de Catalunya | ||
16:56 19mLive Q&A | Joint Q&A (Engineering GenAI Systems) CAIN Program | ||
17:15 15mDay closing | Closing CAIN Program | ||