Can ChatGPT Repair Non-Order-Dependent Tests?
Regression testing helps developers check whether the latest code changes break software functionality. Flaky tests, which can non- deterministically pass or fail on the same code version, may mislead developers’ concerns, resulting in missing some bugs or spending time pinpointing bugs that do not exist. Existing flakiness detection and mitigation techniques have primarily focused on general order-dependent (OD) and implementation-dependent (ID) flaky tests. There is also a dearth of research on repairing test flakiness, out of which, mostly have focused on repairing OD flaky tests, and a few have explored repairing a subcategory of non-order-dependent (NOD) flaky tests that are caused by asynchronous waits. As a result, there is a demand for devising techniques to reproduce, detect, and repair NOD flaky tests. Large language models (LLMs) have shown great effectiveness in several programming tasks. To explore the potential of LLMs in addressing NOD flakiness, this paper investigates the possibility of using ChatGPT to repair different categories of NOD flaky tests. Our comprehensive study on 118 from the IDoFT dataset shows that ChatGPT, despite as a leading LLM with notable success in multiple code generation tasks, is ineffective in repairing NOD test flakiness, even by following the best practices for prompt crafting. We investigated the reasons behind the failure of using ChatGPT in repairing NOD tests, which provided us valuable insights about the next step to advance the field of NOD test flakiness repair.
Sun 14 AprDisplayed time zone: Lisbon change
14:00 - 15:30 | Debugging Flaky Tests in Different DomainsFTW at Amália Rodrigues Chair(s): Owain Parry The University of Sheffield | ||
14:00 30mPaper | On the Impact of Hitting System Resource Limits on Test Flakiness FTW A: Fabian Leinen Technical University of Munich, A: Alexander Perathoner Technical University of Munich, A: Alexander Pretschner TU Munich Pre-print Media Attached | ||
14:30 30mPaper | Flaky Tests in the AI Domain FTW A: Péter Attila Soha Department of Software Engineering, University of Szeged, A: Béla Vancsics , A: Tamás Gergely Department of Software Engineering, University of Szeged, A: Árpád Beszédes Department of Software Engineering, University of Szeged | ||
15:00 30mPaper | Can ChatGPT Repair Non-Order-Dependent Tests? FTW A: Yang Chen University of Illinois at Urbana-Champaign, A: Reyhaneh Jabbarvand University of Illinois at Urbana-Champaign |