Element-Aware Fine-Tuning of Vision-Language Models for Cost-Efficient GUI Testing in an Industrial Setting (ASE 2025 - Industry Showcase)

Who

Mengzhou Wu, Yuzhe Guo, Yuan Cao, Haochuan Lu, Hengyu Zhang, Xia Zeng, Liangchao Yao, Yuetang Deng, Dezhi Ran, Wei Yang, Tao Xie

Track

ASE 2025 Industry Showcase

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 19 Nov 2025 16:50 - 17:00 at Grand Hall 1 - Testing & Analysis 3

Abstract

User Interface (UI) testing is crucial for quality assurance of industrial mobile applications, and yet it remains labor-intensive and challenging to automate effectively. Recent advances in Vision-Language Models (VLMs) present a promising solution for automating GUI testing through mapping natural language instructions to pixels, significantly reducing the manual effort required for writing test scripts and even designing test cases. While numerous VLMs have been proposed and evaluated for GUI testing, they often fail to meet two critical industrial requirements: (1) effectiveness and reliability when handling complex, multi-step workflows in complex industrial applications, and (2) efficiency and cost-effectiveness for large-scale, high-frequency testing environments typical in industrial settings. Toward addressing the preceding industrial requirements, in this paper, we report our experiences in developing and deploying \toolname{}, a three-stage approach that enables VLMs to explicitly detect and reason over discrete GUI elements, thereby overcoming the limitations of pixel-based reasoning for both efficiency and effectiveness improvement. In the first stage, \toolname{} integrates a lightweight UI-element detector named OmniParser to decompose UI screenshots into structured element representations with semantic annotations and spatial relationships. In the second stage, \toolname{} fine-tunes a VLM to enable it to reason about natural language instructions over the detected UI elements, empowering efficient small models to achieve superior performance against expensive large models. Comprehensive evaluations on public benchmarks and deployment at WeChat show that \toolname{} consistently achieves superior accuracy and efficiency compared to state-of-the-art VLMs. Specifically, \toolname{} enables a fine-tuned Qwen2.5-VL-3B model to outperform a 72B model with 75% less training data, validating the effectiveness of incorporating domain knowledge into VLM-based GUI testing. We summarize three major lessons learned from developing and deploying \toolname{}.

Mengzhou Wu

Peking University

Yuzhe Guo

Beijing Jiaotong University

Yuan Cao

Peking University

China

Haochuan Lu

Tencent

China

Hengyu Zhang

Tencent Inc.

Xia Zeng

Tencent

China

Liangchao Yao

Tencent Inc.

Yuetang Deng

Tencent

China

Dezhi Ran

Peking University

China

Wei Yang

UT Dallas

United States

Tao Xie

Peking University

China

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 19 Nov
Displayed time zone: Seoul change

16:00 - 17:00	Testing & Analysis 3NIER Track / Industry Showcase at Grand Hall 1

16:00 10m Talk		Acceleration of Automotive Software Development by Retrieval Augmented Integration Test Script Generation Industry Showcase Masashi Mizoguchi Hitachi Ltd., Kentaro Yoshimura Hitachi, Ltd., Keita Nakazawa Astemo, Ltd., Yasuomi D. Sato Astemo, Ltd., Takahiro Iida Astemo, Ltd., Fumio Narisawa Astemo, Ltd.
16:10 10m Talk		LLM-Powered Fully Automated Chaos Engineering: Towards Enabling Anyone to Build Resilient Software Systems at Low Cost NIER Track Daisuke Kikuta NTT, Inc., Hiroki Ikeuchi NTT, Inc., Kengo Tajiri NTT, Inc. Pre-print Media Attached
16:20 10m Talk		Practical Escape of Exploration Tarpits for Mini-Game Testing in an Industrial Setting Industry Showcase Yuan Cao Peking University, Dezhi Ran Peking University, Haochuan Lu Tencent, Chao Guo Tencent Inc., Xuran Hao Peking University, Zhuoru Chen Capital Normal University, Ting Xiong Tencent Inc., Yuetang Deng Tencent, Tao Xie Peking University
16:30 10m Talk		Streamlining Acceptance Test Generation for Mobile Applications Through Large Language Models: An Industrial Case Study Industry Showcase Pedro Luís Fonseca Critical TechWorks and Faculty of Engineering, University of Porto, Bruno Lima LIACC, Faculty of Engineering, University of Porto, João Pascoal Faria Faculty of Engineering, University of Porto and INESC TEC Pre-print
16:40 10m Talk		Context-Sensitive Pointer Analysis for ArkTS Industry Showcase Yizhuo Yang Beihang University, Lingyun Xu Huawei, Mingyi Zhou Beihang University, Li Li Beihang University
16:50 10m Talk		Element-Aware Fine-Tuning of Vision-Language Models for Cost-Efficient GUI Testing in an Industrial Setting Industry Showcase Mengzhou Wu Peking University, Yuzhe Guo Beijing Jiaotong University, Yuan Cao Peking University, Haochuan Lu Tencent, Hengyu Zhang Tencent Inc., Xia Zeng Tencent, Liangchao Yao Tencent Inc., Yuetang Deng Tencent, Dezhi Ran Peking University, Wei Yang UT Dallas, Tao Xie Peking University