LTM: Scalable and Black-Box Similarity-Based Test Suite Minimization Based on Language Models
Test suites tend to grow when software evolves, making it often infeasible to execute all test cases with the allocated testing budgets, especially for large software systems. Test suite minimization (TSM) is employed to improve the efficiency of software testing by removing redundant test cases, thus reducing testing time and resources while maintaining the fault detection capability of the test suite. Most existing TSM approaches rely on code coverage (white-box) or model-based features, which are not always available to test engineers. Recent TSM approaches that rely only on test code (black-box) have been proposed, such as ATM and FAST-R. The former yields higher fault detection rates (FDR) while the latter is faster. To address scalability while retaining a high FDR, we propose LTM (Language model-based Test suite Minimization), a novel, scalable, and black-box similarity-based TSM approach based on large language models (LLMs), which is the first application of LLMs in the context of TSM. To support similarity measurement using test method embeddings, we investigate five different pre-trained language models: CodeBERT, GraphCodeBERT, UniXcoder, StarEncoder, and CodeLlama, on which we compute two similarity measures: Cosine Similarity and Euclidean Distance. Our goal is to find similarity measures that are not only computationally more efficient but can also better guide a Genetic Algorithm (GA), which is used to search for optimal minimized test suites, thus reducing the overall search time. Experimental results show that the best configuration of LTM (UniXcoder/Cosine) outperforms ATM in three aspects: (a) achieving a slightly greater saving rate of testing time (41.72% versus 41.02%, on average); (b) attaining a significantly higher fault detection rate (0.84 versus 0.81, on average); and, most importantly, (c) minimizing test suites nearly five times faster on average, with higher gains for larger test suites and systems, thus achieving much higher scalability.
Mon 23 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
14:00 - 15:20 | Testing 1Journal First / Industry Papers / Research Papers at Aurora B Chair(s): Jialun Cao Hong Kong University of Science and Technology | ||
14:00 20mTalk | Automated Soap Opera Testing Directed by LLMs and Scenario Knowledge: Feasibility, Challenges, and Road Ahead Research Papers Yanqi Su Australian National University, Zhenchang Xing CSIRO's Data61, Chong Wang Nanyang Technological University, Chunyang Chen TU Munich, Xiwei (Sherry) Xu Data61, CSIRO, Qinghua Lu Data61, CSIRO, Liming Zhu CSIRO’s Data61 DOI | ||
14:20 20mTalk | Automated Test Case Repair Using Language Models Journal First Ahmadreza Saboor Yaraghi University of Ottawa, Darren Holden Carleton University, Nafiseh Kahani Carleton University, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland | ||
14:40 20mTalk | TestGPT-Server: Automatically Testing Microservices with Large Language Models at ByteDance Industry Papers Jue Wang ByteDance, Shuxiang Chen ByteDance, Yu Liu ByteDance, Yuan Deng ByteDance, Lei Zhang ByteDance, Yuanchang Fu ByteDance, Bo Liu ByteDance | ||
15:00 20mTalk | LTM: Scalable and Black-Box Similarity-Based Test Suite Minimization Based on Language Models Journal First RONGQI PAN University of Ottawa, Taher A. Ghaleb Trent University, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland |
Aurora B is the second room in the Aurora wing.
When facing the main Cosmos Hall, access to the Aurora wing is on the right, close to the side entrance of the hotel.