LLMs for Defect Prediction in Evolving Datasets: Emerging Results and Future Directions
Software evolves rapidly, making it challenging for defect prediction models to remain effective without frequent retraining. While Large Language Models (LLMs) have demonstrated impressive capabilities in software engineering tasks, their adaptability to evolving codebases remains under-explored. This study investigates the use of dynamic fine-tuning techniques to enable LLMs to predict defective modules in the face of codebase evolution. We begin our study by curating datasets such as the publicly available QuixBugs and multiple GitHub software projects. We followed a dynamic-fine-tuning approach to adopt the LLMs for the evolving codebases. To mitigate catastrophic forgetting, the LLMs are then evaluated using continual learning strategies such as Elastic Weight Consolidation and memory replay. Preliminary results indicate that LLMs such as LLaMA-LoRA, PolyCoder, and StarCoder when dynamically fine-tuned, achieved comparative performance on medium-sized models such as CodeBERT, GraphCodeBERT, and CodeT5 across evolving codebases. Through this preliminary study, we also provide actionable insights into the application of defect prediction using LLMs for real-world software quality assurance.
Mon 23 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
16:00 - 18:00 | MSR 1Industry Papers / Ideas, Visions and Reflections / Research Papers / Journal First at Aurora B Chair(s): Andrew Begel Carnegie Mellon University | ||
16:00 20mTalk | On Refining the SZZ Algorithm with Bug Discussion Data Journal First Pooja Rani University of Zurich, Fernando Petrulio University of Zurich, Alberto Bacchelli University of Zurich | ||
16:20 20mTalk | SemBIC: Semantic-aware Identification of Bug-inducing Commits Research Papers Xiao Chen The Hong Kong University of Science and Technology, Hengcheng Zhu The Hong Kong University of Science and Technology, Jialun Cao Hong Kong University of Science and Technology, Ming Wen Huazhong University of Science and Technology, Shing-Chi Cheung Hong Kong University of Science and Technology DOI | ||
16:40 20mTalk | Evaluating SZZ Implementations: An Empirical Study on the Linux Kernel Journal First Yunbo Lyu Singapore Management University, Hong Jin Kang University of Sydney, Ratnadira Widyasari Singapore Management University, Singapore, Julia Lawall Inria, David Lo Singapore Management University | ||
17:00 10mTalk | HyperSeq: A Hyper-Adaptive Representation for Predictive Sequencing of States Ideas, Visions and Reflections | ||
17:10 10mTalk | LLMs for Defect Prediction in Evolving Datasets: Emerging Results and Future Directions Ideas, Visions and Reflections Umamaheswara Sharma B National Institute of Technology, Calicut, Farhan Chonari National Institute of Technology Calicut, Gokul K Anilkumar National Institute of Technology Calicut, Saikiran Konchada National Institute of Technology Calicut | ||
17:20 20mTalk | ROSE LCOM Tools Industry Papers Kenneth Lamar University of Central Florida, Peter Pirkelbauer Lawrence Livermore National Laboratory, Zachary Painter University of Central Florida, Damian Dechev University of Central Florida |
Aurora B is the second room in the Aurora wing.
When facing the main Cosmos Hall, access to the Aurora wing is on the right, close to the side entrance of the hotel.