Learning without Forgetting: Towards Continual learning of Fault Localization Models in Industrial Software Systems
Learning-based fault localization has achieved promising results. However, as software and tests are constantly evolving, models trained on old data become ineffective on new data. Particularly, in the context of system testing for large-scale software, each iteration generates a large volume of new data. This makes retraining the model from scratch incur an unacceptable time overhead, while merely fine-tuning on new data leads to catastrophic forgetting. Continual learning offers an effective method for models to avoid catastrophic forgetting during this iterative process. However, existing continual learning methods are not specifically designed for fault localization or for large-scale software system testing scenarios, which leads to their direct application yielding sub-optimal effectiveness. In response, we propose Canto, a novel continual learning framework specifically designed for large-scale software fault localization. Canto first extracts fine-grained program semantics from logs, then utilizes fault characteristics to enhance the weights of certain semantics. Finally, Canto uses an unsupervised algorithm to obtain corresponding embeddings and selects representative exemplars based on clustering. Subsequently, Canto mixes the representative exemplars with new samples for training and adjusts the loss weight according to the model’s degree of mastery over the sample. This allows the model to focus more on samples that are not yet well-mastered during the training process, thereby enabling it to learn new faults while mitigating the forgetting of old ones. In extensive evaluations against 6 continual learning baselines, Canto demonstrates superior performance, improving overall effectiveness by 17.30% to 45.23%.
| Presentation PDF (1115_Li.pdf) | 1.24MiB |
Wed 15 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
11:00 - 12:30 | Testing and Analysis 1SE In Practice (SEIP) / Research Track at Oceania IX Chair(s): Michael Pradel CISPA Helmholtz Center for Information Security | ||
11:00 15mTalk | BFix: Automated Safe Memory-Leak Fixing for Binary Code Research Track Wen Zhang University of Georgia, Botang Xiao University of Georgia, Qingchen Kong University of Georgia, Boyang Yi University of Georgia, Suxin Ji University of Georgia, USA, Yage Hu University of Georgia, Songlan Wang University of Georgia, Wenwen Wang University of Georgia | ||
11:15 15mTalk | Learning without Forgetting: Towards Continual learning of Fault Localization Models in Industrial Software Systems Research Track Chun Li Nanjing University, Hui Li Samsung Electronics (China) R&D Centre, Zhong Li Nanjing University, Minxue Pan Nanjing University, Xuandong Li Nanjing University Media Attached File Attached | ||
11:30 15mTalk | Memory-Efficient Large Language Models for Program Repair with Semantic-Guided Patch Generation Research Track Thanh Le-Cong Singapore University of Technology and Design, Singapore, Xuan-Bach D. Le University of Melbourne, Toby Murray University of Melbourne Media Attached | ||
11:45 15mTalk | Addressing Test Flakiness: Practical Approaches in a Database-Reliant Industrial System SE In Practice (SEIP) George Vegelien Delft University of Technology, Carolin Brandt Delft University of Technology, Bas Graaf Exact, Arie van Deursen TU Delft Pre-print | ||
12:00 15mTalk | XTrace: A Non-Invasive Dynamic Tracing Framework for Android Applications in Production SE In Practice (SEIP) Qi Hu ByteDance, Jiangchao Liu ByteDance, Lin Zhang ByteDance, Edward Jiang ByteDance, Xin Yu ByteDance | ||
12:15 15mTalk | Delta Debugging for LLM-integrated Systems SE In Practice (SEIP) Hao-Nan Zhu University of California, Davis, Muhammad Numair Mansur Amazon Web Services, Martin Schäf Amazon Web Services, Zeya Chen Amazon Web Services, Tancrède Lepoint Amazon, Willem Visser Amazon Web Services | ||