Navigating Mobile Testing Evaluation: A Comprehensive Statistical Analysis of Android GUI Testing Metrics (ASE 2024 - Research Papers)

Who

Yuanhong Lan, Yifei Lu, Minxue Pan, Xuandong Li

Track

ASE 2024 Research Papers

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 29 Oct 2024 14:00 - 14:15 at Carr - Web and UI Chair(s): Mattia Fazzini

Abstract

The prominent role of mobile apps in daily life has underscored the need for robust quality assurance, leading to the development of various automated Android GUI testing approaches. Code coverage and fault detection are two primary metrics for evaluating the effectiveness of these testing approaches. However, conducting a reliable and robust evaluation based on the two metrics remains challenging, due to the imperfections of the current evaluation system, with a tangle of numerous metric granularities and the interference of multiple nondeterminism in tests. For instance, the evaluation solely based on the mean or total numbers of detected faults lacks statistical robustness, resulting in numerous conflicting conclusions that impede the comprehensive understanding of stakeholders involved in Android testing, thereby hindering the advancement of Android testing methodologies. To mitigate such issues, this paper presents the first comprehensive statistical study of existing Android GUI testing metrics, involving extensive experiments with 8 SOTA testing approaches on 42 diverse apps, examining aspects including statistical significance, correlation, and variation. Our study focuses on two primary areas: ① The statistical significance and correlation between test metrics and among different metric granularities. ② The influence of test randomness and test convergence on evaluation results of test metrics. By employing statistical analysis to account for the considerable influence of randomness, we achieve notable findings: ① Instruction, ELOC, and method coverage demonstrate notable consistency across both significance evaluation and mean value evaluation, whereas the evaluation on fatal errors compared to core vitals, as well as all errors versus the well-selected errors, reveals a similarly high level of consistency. ② There are evident inconsistencies in the code coverage and fault detection results, indicating both two metrics should be considered for comprehensive evaluation. ③ Code coverage typically exhibits greater stability and robustness in evaluation compared to fault detection, whereas fault detection is quite unstable even with the maximum test rounds ever used in previous research studies. ④ A moderate test duration is sufficient for most approaches to showcase their comprehensive overall effectiveness on most apps in both code coverage and fault detection, indicating the possibility of adopting a moderate test duration to draw preliminary conclusions in Android testing development. These findings inform practical recommendations and support our proposal of an effective framework to enhance future mobile testing evaluations.

Yuanhong Lan

Nanjing University

Yifei Lu

Nanjing University

Minxue Pan

Nanjing University

China

Xuandong Li

Nanjing University

China

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 29 Oct
Displayed time zone: Pacific Time (US & Canada) change

13:30 - 15:00	Web and UIResearch Papers / Industry Showcase / Tool Demonstrations at Carr Chair(s): Mattia Fazzini University of Minnesota

13:30 15m Talk		Beyond Manual Modeling: Automating GUI Model Generation Using Design Documents Research Papers Shaoheng Cao Nanjing University, Renyi Chen Samsung Electronics（China）R&D Centre, Minxue Pan Nanjing University, Wenhua Yang Nanjing University of Aeronautics and Astronautics, Xuandong Li Nanjing University
13:45 15m Talk		Towards a Robust Waiting Strategy for Web GUI Testing for an Industrial Software System Industry Showcase Haonan Zhang University of Waterloo, Lizhi Liao Memorial University of Newfoundland, Zishuo Ding The Hong Kong University of Science and Technology (Guangzhou), Weiyi Shang University of Waterloo, Nidhi Narula ERA Environmental, Catalin Sporea ERA Environmental Management Solutions, Andrei Toma ERA Environmental Management Solutions, Sarah Sajedi ERA Environmental Management Solutions
14:00 15m Talk		Navigating Mobile Testing Evaluation: A Comprehensive Statistical Analysis of Android GUI Testing Metrics Research Papers Yuanhong Lan Nanjing University, Yifei Lu Nanjing University, Minxue Pan Nanjing University, Xuandong Li Nanjing University
14:15 15m Talk		Can Cooperative Multi-Agent Reinforcement Learning Boost Automatic Web Testing? An Exploratory Study Research Papers Yujia Fan Southern University of Science and Technology, Sinan Wang Southern University of Science and Technology, Zebang Fei Southern University of Science and Technology, Yao Qin Southern University of Science and Technology, Huaxuan Li Southern University of Science and Technology, Yepang Liu Southern University of Science and Technology
14:30 10m Talk		Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat Industry Showcase Sidong Feng Monash University, Haochuan Lu Tencent, Jianqin Jiang Tencent Inc., Ting Xiong Tencent Inc., Likun Huang Tencent Inc., Yinglin Liang Tencent Inc., Xiaoqin Li Tencent Inc., Yuetang Deng Tencent, Aldeida Aleti Monash University
14:40 10m Talk		Self-Elicitation of Requirements with Automated GUI Prototyping Tool Demonstrations Kristian Kolthoff Institute for Enterprise Systems (InES), University Of Mannheim, Christian Bartelt , Simone Paolo Ponzetto Data and Web Science Group, University of Mannheim, Kurt Schneider Leibniz Universität Hannover, Software Engineering Group DOI Pre-print Media Attached