ICSME 2025
Sun 7 - Fri 12 September 2025 Auckland, New Zealand
Thu 11 Sep 2025 14:20 - 14:30 at Case Room 3 260-055 - Session 9 - Testing 3 Chair(s): Sigrid Eldh

Snapshot testing has emerged as a critical technique for UI validation in modern software development, yet it suffers from substantial maintenance overhead due to frequent UI changes causing test failures that require manual inspection to distinguish between genuine regressions and intentional design changes. This manual triage process becomes increasingly burdensome as applications evolve, creating a need for automated analysis solutions. This paper introduces LLMShot, a novel framework that leverages Vision-Language Models (VLMs) to automatically analyze snapshot test failures through semantic classification of UI changes. To evaluate LLMShot’s effectiveness, we developed a comprehensive dataset using a feature-rich iOS application with configurable feature flags, creating realistic scenarios that produce authentic snapshot differences representative of real development workflows. Our evaluation using Gemma3 models demonstrates strong classification performance, with the 12B variant achieving over 84% recall in identifying failure root causes while the 4B model offers practical deployment advantages with acceptable performance for continuous integration environments. However, our exploration of selective ignore mechanisms revealed significant limitations in current prompting-based approaches for controllable visual reasoning. LLMShot represents the first automated approach to semantic snapshot test analysis, offering developers structured insights that can substantially reduce manual triage effort and advance toward more intelligent UI testing paradigms.

Thu 11 Sep

Displayed time zone: Auckland, Wellington change

13:30 - 15:00
Session 9 - Testing 3Journal First Track / NIER Track / Tool Demonstration Track / Research Papers Track / Registered Reports at Case Room 3 260-055
Chair(s): Sigrid Eldh Ericsson AB, Mälardalen University, Carleton University
13:30
15m
Full-paper
Metamorphic Testing of Large Language Models for Natural Language Processing
Research Papers Track
Steven Cho The University of Auckland, New Zealand, Stefano Ruberto JRC European Commission, Valerio Terragni University of Auckland
Pre-print
13:45
15m
Onweer: Automated Resilience Testing through Fuzzing
Research Papers Track
Gilles Coremans Vrije Universiteit Brussel, Coen De Roover Vrije Universiteit Brussel
Pre-print
14:00
10m
Generating Highly Structured Test Inputs Leveraging Constraint-Guided Graph Refinement
Registered Reports
Zhaorui Yang University of California, Riverside, Yuxin Qiu University of California at Riverside, Haichao Zhu Meta, Qian Zhang University of California at Riverside
14:10
10m
Prioritizing Test Smells: An Empirical Evaluation of Quality Metrics and Developer Perceptions
NIER Track
Md Arif Hasan University of Dhaka, Bangladesh, Toukir Ahammed Institute of Information Technology, University of Dhaka
14:20
10m
LLMShot: Reducing snapshot testing maintanence via LLMs
NIER Track
Ergün Batuhan Kaynak Bilkent University, Mayasah Lami Bilkent University, Sahand Moslemi Yengejeh Bilkent University, Anil Koyuncu Bilkent University
Pre-print
14:30
15m
Combinatorial Transition Testing in Dynamically Adaptive Systems: Implementation and Test Oracle
Journal First Track
Pierre Martou UCLouvain / ICTEAM, Benoît Duhoux Université catholique de Louvain, Belgium, Kim Mens Université catholique de Louvain, ICTEAM institute, Belgium, Axel Legay Université Catholique de Louvain, Belgium
14:45
10m
LLMLOOP: Improving LLM-Generated Code and Tests through Automated Iterative Feedback Loops
Tool Demonstration Track
Ravin Ravi University of Auckland, Dylan Bradshaw University of Auckland, Stefano Ruberto JRC European Commission, Gunel Jahangirova King's College London, Valerio Terragni University of Auckland
Pre-print