Seven Failure Points When Engineering a Retrieval Augmented Generation System
Software engineers are increasingly adding semantic search capabilities to applications using a strategy known as Retrieval Augmented Generation (RAG). A RAG system involves finding documents that semantically match a query and then passing the documents to a large language model (LLM) such as ChatGPT to extract the right answer using an LLM. RAG systems aim to: a) reduce the problem of hallucinated responses from LLMs, b) link sources/references to generated responses, and c) remove the need for annotating documents with meta-data. However, RAG systems suffer from limitations inherent to information retrieval systems and from reliance on LLMs. In this paper, we present an experience report on the failure points of RAG systems from three case studies from separate domains: research, education, and biomedical. We share the lessons learned and present 7 failure points to consider when designing a RAG system. The two key takeaways arising from our work are: 1) validation of a RAG system is only feasible during operation, and 2) the robustness of a RAG system evolves rather than designed in at the start. We conclude with a list of potential research directions on RAG systems for the software engineering community.
Mon 15 AprDisplayed time zone: Lisbon change
16:00 - 18:00 | System QualitiesResearch and Experience Papers / Industry Talks at Pequeno Auditório Chair(s): Andrei Paleyes Department of Computer Science and Technology, Univesity of Cambridge | ||
16:00 10mTalk | Modeling Resilience of Collaborative AI Systems Research and Experience Papers Diaeddin Rimawi Free University of Bozen-Bolzano, Antonio Liotta Free University of Bozen-Bolzano, Marco Todescato Fraunhofer Italia, Barbara Russo | ||
16:10 10mTalk | Seven Failure Points When Engineering a Retrieval Augmented Generation System Research and Experience Papers Scott Barnett Applied Artificial Intelligence Institute, Deakin University, Stefanus Kurniawan Deakin University, Srikanth Thudumu Deakin University, Zach Brannelly Deakin University, Mohamed Abdelrazek Deakin University, Australia | ||
16:20 15mTalk | POLARIS: A framework to guide the development of Trustworthy AI systems Research and Experience Papers Maria Teresa Baldassarre Department of Computer Science, University of Bari , Domenico Gigante SER&Practices and University of Bari, Marcos Kalinowski Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Azzurra Ragone University of Bari | ||
16:35 15mTalk | Worst-Case Convergence Time of ML Algorithms via Extreme Value Theory Research and Experience Papers A: Saeid Tizpaz-Niari University of Texas at El Paso, A: Sriram Sankaranarayanan University of Colorado, Boulder | ||
16:50 15mTalk | Is Your Anomaly Detector Ready for Change? Adapting AIOps Solutions to the Real World Research and Experience Papers Lorena Poenaru-Olaru TU Delft, Natalia Karpova TU Delft, Luís Cruz Delft University of Technology, Jan S. Rellermeyer Leibniz University Hannover, Arie van Deursen Delft University of Technology | ||
17:05 15mTalk | Novel Contract-based Runtime Explainability Framework for End-to-End Ensemble Machine Learning Serving Research and Experience Papers Minh-Tri Nguyen Aalto University, Hong-Linh Truong Aalto University, Tram Truong-Huu Singapore Institute of Technology | ||
17:20 10mIndustry talk | Trustworthy AI: Industry-Guided Tooling of the Methods Industry Talks Zakaria Chihani CEA, LIST, France | ||
17:30 15mLive Q&A | System Qualities: Q&A Session Research and Experience Papers | ||
17:45 15mDay closing | Closing Research and Experience Papers Jan Bosch Chalmers University of Technology |