AugmenTest: Enhancing Tests with LLM-driven Oracles (ICST 2025 - Research Papers)

Mon 31 March - Fri 4 April 2025 Naples, Italy

Who

Shaker Mahmud Khandaker, Fitsum Kifetew, Davide Prandi, Angelo Susi

Track

ICST 2025 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 2 Apr 2025 11:00 - 11:15 at Aula Magna (AM) - LLMs in Testing Chair(s): Phil McMinn

Abstract

Automated test generation is crucial for ensuring the reliability and robustness of software applications while at the same time reducing the effort needed. While significant progress has been made in test generation research, generating valid test oracles still remains an open problem. To address this challenge, we present AugmenTest, an ap- proach leveraging Large Language Models (LLMs) to infer correct test oracles based on available documentation of the software under test. Unlike most existing methods that rely on code, AugmenTest utilizes the semantic capabilities of LLMs to infer the intended behavior of a method from documentation and developer comments, without looking at the code. AugmenTest includes four variants: Simple Prompt, Extended Prompt, RAG with a generic prompt (without the context of class or method under test), and RAG with Simple Prompt, each offering different levels of contextual information to the LLMs. To evaluate our work, we selected 158 Java classes and gen- erated multiple mutants for each. We then generated tests from these mutants, focusing only on tests that passed on the mutant but failed on the original class, to ensure that the tests effectively captured bugs. This resulted in 203 unique tests with distinct bugs, which were then used to evaluate AugmenTest. Results show that in the most conservative scenario, AugmenTest’s Extended Prompt consistently outperformed the Simple Prompt, achieving a success rate of 30% for generating correct assertions. In comparison, the state-of-the-art TOGA approach achieved 8.2%. Contrary to our expectations, the RAG-based approaches did not lead to improvements, with performance of 18.2% success rate for the most conservative scenario. Our study demonstrates the potential of LLMs in improving the reliability of automated test generation tools, while also highlighting areas for future enhancement.

Link to Preprint

https://arxiv.org/abs/2501.17461

Shaker Mahmud Khandaker

Fondazione Bruno Kessler

Italy

Fitsum Kifetew

Fondazione Bruno Kessler

Italy

Davide Prandi

Fondazione Bruno Kessler

Italy

Angelo Susi

Fondazione Bruno Kessler

Italy

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 2 Apr
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

11:00 - 12:30	LLMs in TestingResearch Papers / Industry / Journal-First Papers at Aula Magna (AM) Chair(s): Phil McMinn University of Sheffield

11:00 15m Talk		AugmenTest: Enhancing Tests with LLM-driven Oracles Research Papers Shaker Mahmud Khandaker Fondazione Bruno Kessler, Fitsum Kifetew Fondazione Bruno Kessler, Davide Prandi Fondazione Bruno Kessler, Angelo Susi Fondazione Bruno Kessler Pre-print
11:15 15m Talk		Impact of Large Language Models of Code on Fault Localization Research Papers Suhwan Ji Yonsei University, Sanghwa Lee Kangwon National University, Changsup Lee Kangwon National University, Yo-Sub Han Yonsei University, Hyeonseung Im Kangwon National University, South Korea
11:30 15m Talk		An Analysis of LLM Fine-Tuning and Few-Shot Learning for Flaky Test Detection and Classification Research Papers Riddhi More Ontario Tech University, Jeremy Bradbury Ontario Tech University
11:45 15m Talk		Evaluating the Effectiveness of LLMs in Detecting Security Vulnerabilities Research Papers Avishree Khare , Saikat Dutta Cornell University, Ziyang Li University of Pennsylvania, Alaia Solko-Breslin University of Pennsylvania, Mayur Naik UPenn, Rajeev Alur University of Pennsylvania
12:00 15m Talk		FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categories and Test Code Repair Journal-First Papers Sakina Fatima University of Ottawa, Hadi Hemmati York University, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland
12:15 15m Talk		Integrating LLM-based Text Generation with Dynamic Context Retrieval for GUI Testing Industry Juyeon Yoon Korea Advanced Institute of Science and Technology, Seah Kim Samsung Research, Somin Kim Korea Advanced Institute of Science and Technology, Sukchul Jung Samsung Research, Shin Yoo KAIST