ICST 2026
Mon 18 - Fri 22 May 2026 Daejeon, South Korea
Tue 19 May 2026 16:25 - 16:50 at Room 101 - LLM-Assisted Test Generation Chair(s): Shifat Sahariar Bhuiyan

The rapid evolution of Large Language Models (LLMs) has significantly impacted software engineering, leading to a growing number of studies exploring their use in automated unit test generation. However, the standalone use of LLMs without post-processing has proven insufficient, often resulting in a high number of tests that fail to compile or fail to achieve high coverage. Several techniques/tools have been proposed to address these issues, reporting substantial improvements in test compila- tion and coverage. While interesting and important, LLM-based test generation techniques have been evaluated in relation to relatively weak baselines (for todays’ standards), i.e., old LLM versions and relatively weak prompts, which may exacerbate the performance contribution of the approaches. In other words, it is likely that the use of stronger (newer) LLMs may obviate any advantage that these techniques bring. We investigate this issue by replicating four state-of-the-art LLM-based test generation tools, HITS, SymPrompt, TestSpark, and CoverUp that include engineering components aimed at guiding the test generation process through test compilation and execution feedback, and evaluate their relative effectiveness and efficiency over a plain LLM test generation method. We integrate the current versions of LLMs in all the approaches, which are later versions than the ones used by their initial studies, and conduct an experiment using a dataset comprising 393 classes and 3,657 methods. Perhaps surprising, our results show that the plain LLM-based approach can outperform previous state-of-the-art approaches in all test effectiveness metrics we used: line coverage (by 17.72%), branch coverage (by 19.80%) and mutation score (by 20.92%), and it does so at a comparable cost (number of LLM queries). We also observe that the level of granularity where the plain LLM- based is applied has a significant impact on the involved cost. We therefore propose targeting first the program classes, where test generation is more efficient, and then the uncovered methods as a possible way to reduce the number of LLM requests. We find that such an approach achieves test effectiveness comparable (slightly higher) to the other methods while requiring approximately 20% less requests to the LLM.

Tue 19 May

Displayed time zone: Seoul change

16:00 - 17:30
LLM-Assisted Test GenerationShort Papers, Vision and Emerging Results / Research Papers at Room 101
Chair(s): Shifat Sahariar Bhuiyan UniversitĂ  della Svizzera italiana
16:00
25m
Talk
Consistency Meets Verification: Enhancing Test Generation Quality in Large Language Models Without Ground-Truth Solutions
Research Papers
Hamed Taherkhani York University, Alireza Daghighfarsoodeh York University, Mohammad Chowdhury York University, Hung Viet Pham York University, Hadi Hemmati York University
16:25
25m
Talk
How well LLM-based test generation techniques perform with newer LLM versions?
Research Papers
Michael Konstantinou University of Luxembourg, Renzo Degiovanni Luxembourg Institute of Science and Technology, Mike Papadakis University of Luxembourg
16:50
25m
Talk
Improving Automated Patch Correctness Assessment by Designing LLM-Based OraclesArtifact ReviewedArtifact Available
Research Papers
Inyeong Jang Duksung Women's University, Jinyoung Kim Sungkyunkwan University
17:15
15m
Talk
Developer vs. DSpot vs. ChatGPT: A Comparative Study of JUnit Test Amplification
Short Papers, Vision and Emerging Results
David Onyango Owuor North Dakota State University, Ajay Jha North Dakota State University
Pre-print