ICST 2025
Mon 31 March - Fri 4 April 2025 Naples, Italy

This program is tentative and subject to change.

Wed 2 Apr 2025 12:00 - 12:15 at Aula Magna (AM) - LLMs in Testing Chair(s): Phil McMinn

Flaky tests are problematic because they non-deterministically pass or fail for the same software version under test, causing confusion and wasting development effort. While machine learning models have been used to predict flakiness and its root causes, there is much less work on providing support to fix the problem. To address this gap, in this paper, we focus on predicting the type of fix that is required to remove flakiness and then repair the test code on that basis. We do this for a subset of flaky tests where the root cause of flakiness is in the test itself and not in the production code. One key idea is to guide the repair process with additional knowledge about the test’s flakiness in the form of its predicted fix category. Thus, we first propose a framework that automatically generates labeled datasets for 13 fix categories and trains models to predict the fix category of a flaky test by analyzing the test code only. Our experimental results using code models and few-shot learning show that we can correctly predict most of the fix categories. To show the usefulness of such fix category labels for automatically repairing flakiness, we augment the prompts of GPT 3.5 Turbo, a Large Language Model (LLM), with such extra knowledge to request repair suggestions. The results show that our suggested fix category labels, complemented with in-context learning, significantly enhance the capability of GPT 3.5 Turbo in generating fixes for flaky tests. Based on the execution and analysis of a sample of GPT-repaired flaky tests, we estimate that a large percentage of such repairs, (roughly between 51% and 83%) can be expected to pass. For the failing repaired tests, on average, 16% of the test code needs to be further changed for them to pass.

This program is tentative and subject to change.

Wed 2 Apr

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

11:00 - 12:30
LLMs in TestingResearch Papers / Industry / Journal-First Papers at Aula Magna (AM)
Chair(s): Phil McMinn University of Sheffield
11:00
15m
Talk
AugmenTest: Enhancing Tests with LLM-driven Oracles
Research Papers
Shaker Mahmud Khandaker Fondazione Bruno Kessler, Fitsum Kifetew Fondazione Bruno Kessler, Davide Prandi Fondazione Bruno Kessler, Angelo Susi Fondazione Bruno Kessler
Pre-print
11:15
15m
Talk
Impact of Large Language Models of Code on Fault Localization
Research Papers
Suhwan Ji Yonsei University, Sanghwa Lee Kangwon National University, Changsup Lee Kangwon National University, Yo-Sub Han Yonsei University, Hyeonseung Im Kangwon National University, South Korea
11:30
15m
Talk
An Analysis of LLM Fine-Tuning and Few-Shot Learning for Flaky Test Detection and Classification
Research Papers
Riddhi More Ontario Tech University, Jeremy Bradbury Ontario Tech University
11:45
15m
Talk
Evaluating the Effectiveness of LLMs in Detecting Security Vulnerabilities
Research Papers
Avishree Khare , Saikat Dutta Cornell University, Ziyang Li University of Pennsylvania, Alaia Solko-Breslin University of Pennsylvania, Mayur Naik UPenn, Rajeev Alur University of Pennsylvania
12:00
15m
Talk
FlakyFix: Using Large Language Models for Predicting Flaky Test Fix Categories and Test Code Repair
Journal-First Papers
Sakina Fatima University of Ottawa, Hadi Hemmati York University, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland
12:15
15m
Talk
Integrating LLM-based Text Generation with Dynamic Context Retrieval for GUI Testing
Industry
Juyeon Yoon Korea Advanced Institute of Science and Technology, Seah Kim Samsung Research, Somin Kim Korea Advanced Institute of Science and Technology, Sukchul Jung Samsung Research, Shin Yoo Korea Advanced Institute of Science and Technology
:
:
:
: