Enhancing Exploratory Testing by Large Language Model and Knowledge Graph (ICSE 2024 - Research Track)

Who

Yanqi Su, Dianshu Liao, Zhenchang Xing, Qing Huang, Mulong Xie, Qinghua Lu, Xiwei (Sherry) Xu

Track

ICSE 2024 Research Track

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 19 Apr 2024 11:00 - 11:15 at Grande Auditório - LLM, NN and other AI technologies 5 Chair(s): Baishakhi Ray

Abstract

Exploratory testing leverages the tester’s knowledge and creativity to design test cases for effectively uncovering system-level bugs from the end user’s perspective. Researchers have worked on test scenario generation to support exploratory testing based on a system knowledge graph, enriched with scenario and oracle knowledge from bug reports. Nevertheless, the adoption of this approach is hindered by difficulties in handling bug reports of inconsistent quality and varied expression styles, along with the infeasibility of the generated test scenarios. To overcome these limitations, we utilize the superior natural language understanding (NLU) capabilities of Large Language Models (LLMs) to construct a System KG of User Tasks and Failures (SysKG-UTF). Leveraging the system and bug knowledge from the KG, along with the logical reasoning capabilities of LLMs, we generate test scenarios with high feasibility and coherence. Particularly, we design chain-of-thought (CoT) reasoning to extract human-like knowledge and logical reasoning from LLMs, simulating a developer’s process of validating test scenario feasibility. Our evaluation shows that our approach significantly enhances the KG construction, particularly for bug reports with low quality. Furthermore, our approach generates test scenarios with high feasibility and coherence. The user study further proves the effectiveness of our generated test scenarios in supporting exploratory testing. Specifically, 8 participants find 36 bugs from 8 seed bugs in two hours using our test scenarios, a significant improvement over the 21 bugs found by the state-of-the-art baseline.

Yanqi Su

Australian National University

Dianshu Liao

Australian National University

Zhenchang Xing

CSIRO's Data61

Qing Huang

School of Computer Information Engineering, Jiangxi Normal University

China

Mulong Xie

CSIRO's Data61

Qinghua Lu

Data61, CSIRO

Australia

Xiwei (Sherry) Xu