ASE 2024
Sun 27 October - Fri 1 November 2024 Sacramento, California, United States
Tue 29 Oct 2024 14:00 - 14:15 at Carr - Web and UI Chair(s): Mattia Fazzini

The prominent role of mobile apps in daily life has underscored the need for robust quality assurance, leading to the development of various automated Android GUI testing approaches. Code coverage and fault detection are two primary metrics for evaluating the effectiveness of these testing approaches. However, conducting a reliable and robust evaluation based on the two metrics remains challenging, due to the imperfections of the current evaluation system, with a tangle of numerous metric granularities and the interference of multiple nondeterminism in tests. For instance, the evaluation solely based on the mean or total numbers of detected faults lacks statistical robustness, resulting in numerous conflicting conclusions that impede the comprehensive understanding of stakeholders involved in Android testing, thereby hindering the advancement of Android testing methodologies. To mitigate such issues, this paper presents the first comprehensive statistical study of existing Android GUI testing metrics, involving extensive experiments with 8 SOTA testing approaches on 42 diverse apps, examining aspects including statistical significance, correlation, and variation. Our study focuses on two primary areas: ① The statistical significance and correlation between test metrics and among different metric granularities. ② The influence of test randomness and test convergence on evaluation results of test metrics. By employing statistical analysis to account for the considerable influence of randomness, we achieve notable findings: ① Instruction, ELOC, and method coverage demonstrate notable consistency across both significance evaluation and mean value evaluation, whereas the evaluation on fatal errors compared to core vitals, as well as all errors versus the well-selected errors, reveals a similarly high level of consistency. ② There are evident inconsistencies in the code coverage and fault detection results, indicating both two metrics should be considered for comprehensive evaluation. ③ Code coverage typically exhibits greater stability and robustness in evaluation compared to fault detection, whereas fault detection is quite unstable even with the maximum test rounds ever used in previous research studies. ④ A moderate test duration is sufficient for most approaches to showcase their comprehensive overall effectiveness on most apps in both code coverage and fault detection, indicating the possibility of adopting a moderate test duration to draw preliminary conclusions in Android testing development. These findings inform practical recommendations and support our proposal of an effective framework to enhance future mobile testing evaluations.

Tue 29 Oct

Displayed time zone: Pacific Time (US & Canada) change

13:30 - 15:00
Web and UIResearch Papers / Industry Showcase / Tool Demonstrations at Carr
Chair(s): Mattia Fazzini University of Minnesota
13:30
15m
Talk
Beyond Manual Modeling: Automating GUI Model Generation Using Design Documents
Research Papers
Shaoheng Cao Nanjing University, Renyi Chen Samsung Electronics(China)R&D Centre, Minxue Pan Nanjing University, Wenhua Yang Nanjing University of Aeronautics and Astronautics, Xuandong Li Nanjing University
13:45
15m
Talk
Towards a Robust Waiting Strategy for Web GUI Testing for an Industrial Software System
Industry Showcase
Haonan Zhang University of Waterloo, Lizhi Liao Memorial University of Newfoundland, Zishuo Ding The Hong Kong University of Science and Technology (Guangzhou), Weiyi Shang University of Waterloo, Nidhi Narula ERA Environmental, Catalin Sporea ERA Environmental Management Solutions, Andrei Toma ERA Environmental Management Solutions, Sarah Sajedi ERA Environmental Management Solutions
14:00
15m
Talk
Navigating Mobile Testing Evaluation: A Comprehensive Statistical Analysis of Android GUI Testing Metrics
Research Papers
Yuanhong Lan Nanjing University, Yifei Lu Nanjing University, Minxue Pan Nanjing University, Xuandong Li Nanjing University
14:15
15m
Talk
Can Cooperative Multi-Agent Reinforcement Learning Boost Automatic Web Testing? An Exploratory Study
Research Papers
Yujia Fan Southern University of Science and Technology, Sinan Wang Southern University of Science and Technology, Zebang Fei Southern University of Science and Technology, Yao Qin Southern University of Science and Technology, Huaxuan Li Southern University of Science and Technology, Yepang Liu Southern University of Science and Technology
14:30
10m
Talk
Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat
Industry Showcase
Sidong Feng Monash University, Haochuan Lu Tencent, Jianqin Jiang Tencent Inc., Ting Xiong Tencent Inc., Likun Huang Tencent Inc., Yinglin Liang Tencent Inc., Xiaoqin Li Tencent Inc., Yuetang Deng Tencent, Aldeida Aleti Monash University
14:40
10m
Talk
Self-Elicitation of Requirements with Automated GUI Prototyping
Tool Demonstrations
Kristian Kolthoff Institute for Enterprise Systems (InES), University Of Mannheim, Christian Bartelt , Simone Paolo Ponzetto Data and Web Science Group, University of Mannheim, Kurt Schneider Leibniz Universität Hannover, Software Engineering Group
DOI Pre-print Media Attached