CAIN 2025
Sun 27 - Mon 28 April 2025 Ottawa, Ontario, Canada
co-located with ICSE 2025
Sun 27 Apr 2025 14:15 - 14:30 at 208 - Architecting and Testing AI Systems Chair(s): Jan-Philipp Steghöfer

Retrieval Augmented Generation (RAG) is increasingly employed in building Generative AI applications, yet their evaluation often relies on manual, trial-and-error processes. Automating this evaluation process involves generating test data to trigger failures involving context comprehension, data formatting, specificity, and content completeness. Random question-answer generation is insufficient. However, prior works rely on standard QA datasets, benchmarks and tactics that are not tailored to the specific domain requirements. Hence, current approaches and datasets do not trigger sufficiently broad and context-specific failures. In this paper, we introduce evaluation scenarios that describe the process of generating question-answer pairs from content indexed by RAG pipelines, and they are designed to trigger a wider range of failures and to simplify automation. This enables developers to identify and address weaknesses more effectively. We validate our approach on five open-source RAG pipelines using three datasets. Our approach triggers high failure rates, by generating prompts that combine multiple questions (up to 91% failure rate) highlighting the need for developers to prioritize handling such queries. We generated failure rates of 60% in an academic domain dataset and 53% and 64% in open-domain datasets. Compared to existing state-of-the-art methods, our approach triggers 77% more failures on average per RAG pipeline and 53% more failures on average per dataset, offering a mechanism to support developers to improve the RAG pipeline quality.

Sun 27 Apr

Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30
Architecting and Testing AI SystemsResearch and Experience Papers at 208
Chair(s): Jan-Philipp Steghöfer XITASO GmbH IT & Software Solutions
14:00
15m
Talk
How Do Model Export Formats Impact the Development of ML-Enabled Systems? A Case Study on Model IntegrationDistinguished paper Award Candidate
Research and Experience Papers
Shreyas Kumar Parida ETH Zurich, Ilias Gerostathopoulos Vrije Universiteit Amsterdam, Justus Bogner Vrije Universiteit Amsterdam
Pre-print
14:15
15m
Talk
RAGProbe: Breaking RAG Pipelines with Evaluation ScenariosDistinguished paper Award Candidate
Research and Experience Papers
Shangeetha Sivasothy Applied Artificial Intelligence Institute, Deakin University, Scott Barnett Deakin University, Australia, Stefanus Kurniawan Deakin University, Zafaryab Rasool Applied Artificial Intelligence Institute, Deakin University, Rajesh Vasa Deakin University, Australia
14:30
15m
Talk
On-Device or Remote? On the Energy Efficiency of Fetching LLM-Generated Content
Research and Experience Papers
Vince Nguyen Vrije Universiteit Amsterdam, Hieu Huynh Vrije Universiteit Amsterdam, Vidya Dhopate Vrije Universiteit Amsterdam, Anusha Annengala Vrije Universiteit Amsterdam, Hiba Bouhlal Vrije Universiteit Amsterdam, Gian Luca Scoccia Gran Sasso Science Institute, Matias Martinez Universitat Politècnica de Catalunya (UPC), Vincenzo Stoico Vrije Universiteit Amsterdam, Ivano Malavolta Vrije Universiteit Amsterdam
Pre-print Media Attached
14:45
10m
Talk
LoCoML: A Framework for Real-World ML Inference Pipelines
Research and Experience Papers
Kritin Maddireddy IIIT Hyderabad, Santhosh Kotekal Methukula IIIT Hyderabad, Chandrasekar S IIIT Hyderabad, Karthik Vaidhyanathan IIIT Hyderabad
14:55
10m
Talk
Towards Continuous Experiment-driven MLOps
Research and Experience Papers
Keerthiga Rajenthiram Vrije Universiteit Amsterdam, Milad Abdullah Charles University, Ilias Gerostathopoulos Vrije Universiteit Amsterdam, Petr Hnětynka Charles University, Tomas Bures Charles University, Czech Republic, Gerard Pons Universitat Politècnica de Catalunya, Barcelona, Spain, Besim Bilalli Universitat Politècnica de Catalunya, Barcelona, Spain, Anna Queralt Universitat Politècnica de Catalunya, Barcelona, Spain
15:05
25m
Other
Discussion
Research and Experience Papers