Predicting the Root Cause of Flaky Tests Based on Test Smells (ICSR 2025 - 22nd International Conference on Systems and Software Reuse)

Who

Jing Wang, Weixi Zhang, Ruilian Zhao, Ying Shang

Track

ICSR 2025

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 27 Apr 2025 16:45 - 17:15 at 204 - Session 4: Reusable models and Testing Chair(s): Dalila Tamzalit

Abstract

Flaky tests refer to test cases that exhibit inconsistent behaviors across multiple executions, potentially passing or failing unpredictably. They are frequently associated with suboptimal design practices that testers may utilize when crafting test cases, which undermine the quality of software testing. Identifying the root causes of flaky tests is crucial for fixing them. Currently, inspired by the success of the Large Language Models (LLMs), researchers leverage the pre-trained language model to embed flaky test code as vectors and predict its root cause category based on vector similarity measures. However, such code embeddings generated by LLM mainly focus on capturing general semantic features but lack sufficient comprehension of the behavioral patterns involved in test scenarios, leading to the ineffectiveness of root cause identification. Test smells, which reflect poor coding practices or habits when writing test cases, provide complementary information in root cause identification of test flakiness. Therefore, this paper proposes a flaky test root cause identification method based on test smells, which leverages test smells to abstract and express behavioral patterns of test codes and integrates general semantic features extracted via vector embeddings to enhance the feature representation of flaky tests. Furthermore, to capture the complex nonlinear relationships between test smell features and code embeddings, a Feedforward Neural Network is constructed to categorize the root causes of test flakiness. To validate the effectiveness of our method, we performed evaluations on a dataset consisting of 451 Java flaky test cases. The experimental results indicate that our method achieves an F1-score of 80%, which is 7% higher than that of the baseline model that does not incorporate test smells.

File attachments

Predicting the Root Cause of Flaky Tests Based on Test Smells (Predicting_the_Root_Cause_of_Flaky_Tests_Based_on_Test_Smells.pdf)	955KiB

Jing Wang

College of Information Science and Technology, Beijing University of Chemical Technology

Weixi Zhang

College of Information Engineering, Beijing Institute of Petrochemical Technology, Beijing, China

China

Ruilian Zhao

Beijing University of Chemical Technology