An Empirical Comparison between Monkey Testing and Human Testing (Work in progress) (LCTES 2019 - Languages, Compilers, Tools and Theory of Embedded Systems)

Who

Mostafa Mohammed, Haipeng Cai, Na Meng

Track

LCTES 2019

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 23 Jun 2019 16:35 - 16:40 at 105A - Session 4: Benchmarking and In-progress Works Chair(s): Hyunok Oh

Abstract

Android app testing is challenging and time-consuming because fully testing all feasible execution paths is always difficult. Apps are usually tested in two alternative ways: human testing vs. automated testing. Prior studies focused on the comparison between different automated tools. However, some fundamental questions are still unexplored, including (1) how automated testing behaves differently from human testing, and (2) whether automated testing can fully or partially substitute human testing.

This paper presents our empirical study to investigate the open questions. As Monkey has been considered as one of the best automated testing tools due to its usability and relatively good coverage metrics, we applied Monkey to five Android apps of different domains and collected the dynamic event traces. Meanwhile, we also recruited eight users to manually test the same apps and gathered the traces. By comparing the collected run-time information of both testing methods, we revealed that i.) on average, the two methods triggered similar numbers of unique events; ii.) Monkey triggered system events more effectively while humans triggered more UI events; iii.) Monkey could mimic human behaviors when apps have UIs full of clickable widgets to trigger logically independent events; and iv.) Monkey was insufficient to test apps that require for information comprehension and problem-solving skills. Our research will shed light on future research that combines human expertise with the agility of Monkey testing.

Mostafa Mohammed

Virginia Tech

Haipeng Cai

Washington State University, USA

United States

Na Meng