ICSE 2025
Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada

Software testing is crucial to ensure the quality of a software under development. Once a potential bug is identified, a Bug Report (BR) is opened with information to describe and reproduce the found issue. Usually in big companies, hundreds of BRs are opened weekly by different testing teams, which have to be inspected and fixed adequately. This paper is focused on the use of Machine Learning (ML) techniques to automate the Escaped Defect Analysis (EDA), which is an important (but expensive) task to improve the effectiveness of the testing teams. In our work, Escaped Defects (EDs) are bugs or issues that should have been opened by a specific team, but which was accidentally found by another team. The occurrence of EDs is risky, as it is usually related to failures in the testing activities. EDA is usually performed manually by software engineers, who read each BR’s textual content to judge whether it is an ED or not. This is challenging and time-consuming. In our solution, the BR’s content is preprocessed by textual operations and then a feature representation is adopted by a ML classifier to return the probability of EDA labels. This paper presents a ML based approach to classify escaped defects described in bug reports. EDs are bugs missed by the QA team in charge and happened to be uncovered by a different team. To automate the identification of EDs (a costly and error-prone task), a dataset of a partner company is leveraged, text processing operators are adopted for feature engineering and 6 classical ML algorithms are applied. The results show satisfactory accuracy and AUC and the experiments indicate a good trade-off between the number of EDs correctly identified and the number of BRs that have to be inspected in the EDA.