LLMs for Defect Prediction in Evolving Datasets: Emerging Results and Future Directions (FSE 2025 - Ideas, Visions and Reflections)

Mon 23 - Fri 27 June 2025 Trondheim, Norway

Who

Umamaheswara Sharma B, Farhan Chonari, Gokul K Anilkumar, Saikiran Konchada

Track

FSE 2025 Ideas, Visions and Reflections

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 23 Jun 2025 17:10 - 17:20 at Aurora B - MSR 1 Chair(s): Andrew Begel

Abstract

Software evolves rapidly, making it challenging for defect prediction models to remain effective without frequent retraining. While Large Language Models (LLMs) have demonstrated impressive capabilities in software engineering tasks, their adaptability to evolving codebases remains under-explored. This study investigates the use of dynamic fine-tuning techniques to enable LLMs to predict defective modules in the face of codebase evolution. We begin our study by curating datasets such as the publicly available QuixBugs and multiple GitHub software projects. We followed a dynamic-fine-tuning approach to adopt the LLMs for the evolving codebases. To mitigate catastrophic forgetting, the LLMs are then evaluated using continual learning strategies such as Elastic Weight Consolidation and memory replay. Preliminary results indicate that LLMs such as LLaMA-LoRA, PolyCoder, and StarCoder when dynamically fine-tuned, achieved comparative performance on medium-sized models such as CodeBERT, GraphCodeBERT, and CodeT5 across evolving codebases. Through this preliminary study, we also provide actionable insights into the application of defect prediction using LLMs for real-world software quality assurance.

Umamaheswara Sharma B

National Institute of Technology, Calicut

India

Farhan Chonari

National Institute of Technology Calicut

Gokul K Anilkumar

National Institute of Technology Calicut

Saikiran Konchada

National Institute of Technology Calicut

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 23 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

16:00 - 18:00	MSR 1Industry Papers / Ideas, Visions and Reflections / Research Papers / Journal First at Aurora B Chair(s): Andrew Begel Carnegie Mellon University

16:00 20m Talk		On Refining the SZZ Algorithm with Bug Discussion Data Journal First Pooja Rani University of Zurich, Fernando Petrulio University of Zurich, Alberto Bacchelli University of Zurich
16:20 20m Talk		SemBIC: Semantic-aware Identification of Bug-inducing Commits Research Papers Xiao Chen The Hong Kong University of Science and Technology, Hengcheng Zhu The Hong Kong University of Science and Technology, Jialun Cao Hong Kong University of Science and Technology, Ming Wen Huazhong University of Science and Technology, Shing-Chi Cheung Hong Kong University of Science and Technology DOI
16:40 20m Talk		Evaluating SZZ Implementations: An Empirical Study on the Linux Kernel Journal First Yunbo Lyu Singapore Management University, Hong Jin Kang University of Sydney, Ratnadira Widyasari Singapore Management University, Singapore, Julia Lawall Inria, David Lo Singapore Management University
17:00 10m Talk		HyperSeq: A Hyper-Adaptive Representation for Predictive Sequencing of States Ideas, Visions and Reflections Roham Koohestani Delft University of Technology, Maliheh Izadi Delft University of Technology
17:10 10m Talk		LLMs for Defect Prediction in Evolving Datasets: Emerging Results and Future Directions Ideas, Visions and Reflections Umamaheswara Sharma B National Institute of Technology, Calicut, Farhan Chonari National Institute of Technology Calicut, Gokul K Anilkumar National Institute of Technology Calicut, Saikiran Konchada National Institute of Technology Calicut
17:20 20m Talk		ROSE LCOM Tools Industry Papers Kenneth Lamar University of Central Florida, Peter Pirkelbauer Lawrence Livermore National Laboratory, Zachary Painter University of Central Florida, Damian Dechev University of Central Florida

Information for Participants

Mon 23 Jun 2025 16:00 - 18:00 at Aurora B - MSR 1 Chair(s): Andrew Begel

Info for room Aurora B:

Aurora B is the second room in the Aurora wing.

When facing the main Cosmos Hall, access to the Aurora wing is on the right, close to the side entrance of the hotel.