Automatic Generation of Test Cases based on Bug Reports: a Feasibility Study with Large Language Models
Software testing is a core discipline in software engineering where a large array of research results has been produced, notably in automatic test generation. However, the resulting test suits are often incomplete and can be qualified as simple (e.g. unit tests): they only cover parts of the project or they are produced after the bug is fixed and therefore can only serve as regression tests. Yet, several research challenges, such as automatic program repair, build on the assumption that available test suites are sufficient. There is thus a need to break existing barriers in automatic test case generation. While prior work largely focused on random unit testing inputs, we propose to consider generating test cases that realistically represent complex user execution scenarios, which reveal buggy behaviour. Such scenarios are informally described in bug reports, which should therefore be considered as natural inputs for specifying bug-triggering test cases. In this work, we investigate the feasibility of performing this generation by leveraging large language models (LLMs) and using bug reports as inputs. Our experiments consider various settings, including the use of ChatGPT, as an online service for accessing an LLM, as well as CodeGPT, an existing code-related pre-trained LLM that was fine-tuned for our task. Our study is carried out on the Defects4J dataset. Overall, we experimentally show that bug reports associated to up to 50% of Defects4J bugs can prompt ChatGPT to generate an executable test case. We show that even new bug reports (i.e., previously-unseen data to mitigate data leakage threat to validity), can indeed be used as input for generating the executable test cases.
Thu 18 AprDisplayed time zone: Lisbon change
15:30 - 16:00 | |||
15:30 30mPoster | Towards Data Augmentation for Supervised Code Translation Posters Binger Chen Technische Universität Berlin, Jacek golebiowski Amazon AWS, Ziawasch Abedjan Leibniz Universität Hannover | ||
15:30 30mPoster | GDPR indications in commits messages in GitHub repositories Posters | ||
15:30 30mPoster | Automatic Generation of Test Cases based on Bug Reports: a Feasibility Study with Large Language Models Posters Laura Plein University of Luxembourg, Wendkuuni Arzouma Marc Christian OUEDRAOGO University of Luxembourg, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg | ||
15:30 30mPoster | How Does Pre-trained Language Model Perform on Deep Learning Framework Bug Prediction? Posters Xiaoting Du Beijing University of Posts and Telecommunications, Chenglong Li Beihang University, Xiangyue Ma Beihang University, Zheng Zheng Beihang University | ||
15:30 30mPoster | xNose: A Test Smell Detector for C# Posters Partha Protim Paul Shahjalal University of Science & Technology, Md Tonoy Akanda Shahjalal University of Science & Technology, Mohammed Raihan Ullah Shahjalal University of Science & Technology, Dipto Mondal Shahjalal University of Science & Technology, Nazia Sultana Chowdhury Shahjalal University of Science & Technology, Fazle Mohammed Tawsif University of Southern California DOI Pre-print | ||
15:30 30mPoster | Data vs. Model Machine Learning Fairness Testing: An Empirical Study Posters Arumoy Shome Delft University of Technology, Luís Cruz Delft University of Technology, Arie van Deursen Delft University of Technology | ||
15:30 30mPoster | On the Effects of Program Slicing for Vulnerability Detection during Code Inspection: Extended Abstract Posters Aurora Papotti Vrije Universiteit Amsterdam, Fabio Massacci University of Trento; Vrije Universiteit Amsterdam, Katja Tuma Vrije Universiteit Amsterdam | ||
15:30 30mPoster | Multi-step Automated Generation of Parameter Docstrings in Python: An Exploratory Study Posters Vatsal Venkatkrishna Australian National University, Durga Shree Nagabushanam Australian National University, Emmanuel Iko-Ojo Simon Australian National University, Melina Vidoni Australian National University DOI Authorizer link | ||
15:30 30mPoster | Lightweight Semantic Conflict Detection with Static Analysis Posters Galileu Santos de Jesus Federal University of Pernambuco, Paulo Borba Federal University of Pernambuco, Rodrigo Bonifácio Computer Science Department - University of Brasília, Matheus Barbosa de Oliveira Federal University of Pernambuco | ||
15:30 30mPoster | Energy Consumption of Automated Program Repair Posters Matias Martinez Universitat Politècnica de Catalunya (UPC), Silverio Martínez-Fernández UPC-BarcelonaTech, Xavier Franch Universitat Politècnica de Catalunya | ||
15:30 30mPoster | ReviewRanker: A Semi-Supervised Learning Based Approach for Code Review Quality Estimation Posters Saifullah Mahbub United International University, Md. Easin Arafat Eötvös Loránd University, Chowdhury Rafeed Rahman National University of Singapore, Zannatul Ferdows United International University, Masum Hasan University of Rochester | ||
15:30 30mPoster | LogPrompt: Prompt Engineering Towards Zero-Shot and Interpretable Log Analysis Posters Yilun Liu Huawei co. LTD, Shimin Tao University of Science and Technology of China; Huawei co. LTD, Weibin Meng Huawei co. LTD, Feiyu Yao Huawei co. LTD, Xiaofeng Zhao Huawei co. LTD, Hao Yang Huawei co. LTD | ||
15:30 30mPoster | High-precision Online Log Parsing with Large Language Models Posters XiaoLei Chen Fudan University, Jie Shi Fudan University, ChenJ , Peng Wang Fudan University, Wei Wang Fudan University | ||
15:30 30mPoster | Multi-requirement Parametric Falsification Posters |