ASE 2025
Sun 16 - Thu 20 November 2025 Seoul, South Korea

Redundant test cases, although well-studied in software engineering, are previously underexplored in UI testing of mobile apps. Our study of real-world test suites shows that, in large-scale testing suites, redundancy in UI testing often manifests as redundant UI interactions. Although negligible in traditional script-based workflows, such redundancy severely impacts the efficiency of emerging Large Language Model (LLM)-based UI agents, which incur substantial decision latency and token costs from repeated LLM queries for the same interactions. To this end, based on the idea of reusing LLMs’ former decisions, we present TestWeaver, a cost-effective LLM-based testing framework. Leveraging a semantically annotated UI Transition Graph (UTG), TestWeaver is capable of detecting shared interactions across test cases. It processes each interaction with a single LLM query and reuses the result whenever the same interaction occurs. We evaluate TestWeaver on real-world test suites from Meituan. It achieves a 92% success rate with an average cost of $0.11 and 89.7 seconds per case, outperforming the state of the art. We have also deployed TestWeaver in a real-world testing workflow at Meituan for over six months. TestWeaver has executed nearly 2,000 test cases and uncovered 10 previously undetected bugs, while reducing manual testing effort by 75%.