FORGE 2024
Sun 14 Apr 2024 Lisbon, Portugal
co-located with ICSE 2024

Augmented generation techniques such as Retrieval-Augmented Generation (RAG) and Cache-Augmented Generation (CAG) have revolutionized the field by enhancing large language model (LLM) outputs with external knowledge and cached information. However, the integration of vector databases, which serve as a backbone for these augmentations, introduces critical challenges, particularly in ensuring accurate vector matching. False vector matching in these databases can significantly compromise the integrity and reliability of LLM outputs, leading to misinformation or erroneous responses. Despite the crucial impact of these issues, there is a notable research gap in methods to effectively detect and address false vector matches in LLM-augmented generation. This paper presents MeTMaP, a metamorphic testing framework developed to identify false vector matching in LLM-augmented generation systems. We derive eight metamorphic relations (MRs) from six NLP datasets, which form our method’s core, based on the idea that semantically similar texts should match and dissimilar ones should not. MeTMaP uses these MRs to create sentence triplets for testing, simulating real-world LLM scenarios. Our evaluation of MeTMaP over 203 vector matching configurations, involving 29 embedding models and 7 distance metrics, uncovers significant inaccuracies. The results, showing a maximum accuracy of only 41.51% on our tests compared to the original datasets, emphasize the widespread issue of false matches in vector matching methods and the critical need for effective detection and mitigation in LLM-augmented applications.

Sun 14 Apr

Displayed time zone: Lisbon change

11:00 - 12:30
Foundation Models for Software Quality AssuranceResearch Track at Luis de Freitas Branco
Chair(s): Matteo Ciniselli Università della Svizzera Italiana
11:00
14m
Full-paper
Deep Multiple Assertions GenerationFull Paper
Research Track
Hailong Wang Zhejiang University, Tongtong Xu Huawei, Bei Wang Huawei
11:14
14m
Full-paper
MeTMaP: Metamorphic Testing for Detecting False Vector Matching Problems in LLM Augmented GenerationFull Paper
Research Track
Guanyu Wang Beijing University of Posts and Telecommunications, Yuekang Li The University of New South Wales, Yi Liu Nanyang Technological University, Gelei Deng Nanyang Technological University, Li Tianlin Nanyang Technological University, Guosheng Xu Beijing University of Posts and Telecommunications, Yang Liu Nanyang Technological University, Haoyu Wang Huazhong University of Science and Technology, Kailong Wang Huazhong University of Science and Technology
11:28
14m
Full-paper
Planning to Guide LLM for Code Coverage PredictionFull Paper
Research Track
Hridya Dhulipala University of Texas at Dallas, Aashish Yadavally University of Texas at Dallas, Tien N. Nguyen University of Texas at Dallas
11:42
7m
Short-paper
The Emergence of Large Language Models in Static Analysis: A First Look through Micro-BenchmarksNew Idea Paper
Research Track
Ashwin Prasad Shivarpatna Venkatesh University of Paderborn, Samkutty Sabu University of Paderborn, Amir Mir Delft University of Technology, Sofia Reis Instituto Superior Técnico, U. Lisboa & INESC-ID, Eric Bodden
11:49
14m
Full-paper
Reality Bites: Assessing the Realism of Driving Scenarios with Large Language ModelsFull Paper
Research Track
Jiahui Wu Simula Research Laboratory and University of Oslo, Chengjie Lu Simula Research Laboratory and University of Oslo, Aitor Arrieta Mondragon University, Tao Yue Beihang University, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University
12:03
7m
Short-paper
Assessing the Impact of GPT-4 Turbo in Generating Defeaters for Assurance CasesNew Idea Paper
Research Track
Kimya Khakzad Shahandashti York University, Mithila Sivakumar York University, Mohammad Mahdi Mohajer York University, Alvine Boaye Belle York University, Song Wang York University, Timothy Lethbridge University of Ottawa
12:10
20m
Other
Discussion
Research Track