ICSE 2025
Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada
Wed 30 Apr 2025 16:30 - 16:45 at 213 - AI for Program Comprehension 1 Chair(s): Yintong Huo

Large Language Models for Code (LLMs4Code) have been found to exhibit outstanding performance in the software engineering domain, especially the remarkable performance in coding tasks. However, even the most advanced LLMs4Code can inevitably contain incorrect or outdated code knowledge. Due to the high cost of training LLMs4Code, it is impractical to re-train the models for fixing these problematic code knowledge. Model editing is a new technical field for effectively and efficiently correcting erroneous knowledge in LLMs, where various model editing techniques and benchmarks have been proposed recently. Despite that, a comprehensive study that thoroughly compares and analyzes the effectiveness of all state-of-the-art model editing techniques for adapting the knowledge within LLMs4Code models across various code-related tasks is notably absent. To bridge this gap, we perform the first systematic study on applying state-of-the-art model editing approaches to repair the inaccuracy of LLMs4Code. To that end, we introduce a benchmark named CLMEEval, which consists of two datasets, i.e., CoNaLa-Edit (CNLE) with 21K+ code generation samples and CodeSearchNet-Edit (CSNE) with 16K+ code summarization samples. With the help of CLMEEval, we evaluate six advanced model editing techniques on three LLMs4Code: CodeLlama (7B), CodeQwen1.5 (7B), and Stable-Code (3B). Our findings include that the external memorization-based GRACE approach achieves the best knowledge editing effectiveness and specificity (the editing does not influence untargeted knowledge), while generalization (whether the editing can generalize to other semantically-identical inputs) is a universal challenge for existing techniques. Furthermore, building on in-depth case analysis, we introduce an enhanced version of GRACE called A-GRACE, which incorporates contrastive learning to better capture the semantics of the inputs. Results demonstrate that A-GRACE notably enhances generalization while maintaining similar levels of effectiveness and specificity compared to the vanilla GRACE.

Wed 30 Apr

Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30
AI for Program Comprehension 1Research Track at 213
Chair(s): Yintong Huo Singapore Management University, Singapore
16:00
15m
Talk
ADAMAS: Adaptive Domain-Aware Performance Anomaly Detection in Cloud Service Systems
Research Track
Wenwei Gu The Chinese University of Hong Kong, Jiazhen Gu Chinese University of Hong Kong, Jinyang Liu Chinese University of Hong Kong, Zhuangbin Chen Sun Yat-sen University, Jianping Zhang The Chinese University of Hong Kong, Jinxi Kuang The Chinese University of Hong Kong, Cong Feng Huawei Cloud Computing Technology, Yongqiang Yang Huawei Cloud Computing Technology, Michael Lyu The Chinese University of Hong Kong
16:15
15m
Talk
LibreLog: Accurate and Efficient Unsupervised Log Parsing Using Open-Source Large Language Models
Research Track
Zeyang Ma Concordia University, Dong Jae Kim DePaul University, Tse-Hsun (Peter) Chen Concordia University
16:30
15m
Talk
Model Editing for LLMs4Code: How Far are We?
Research Track
Xiaopeng Li National University of Defense Technology, Shangwen Wang National University of Defense Technology, Shasha Li National University of Defense Technology, Jun Ma National University of Defense Technology, Jie Yu National University of Defense Technology, Xiaodong Liu National University of Defense Technology, Jing Wang National University of Defense Technology, Bin Ji National University of Defense Technology, Weimin Zhang National University of Defense Technology
Pre-print
16:45
15m
Talk
Software Model Evolution with Large Language Models: Experiments on Simulated, Public, and Industrial Datasets
Research Track
Christof Tinnes Saarland University, Alisa Carla Welter Saarland University, Sven Apel Saarland University
Pre-print
17:00
15m
Talk
SpecRover: Code Intent Extraction via LLMs
Research Track
Haifeng Ruan National University of Singapore, Yuntong Zhang National University of Singapore, Abhik Roychoudhury National University of Singapore
17:15
15m
Talk
Unleashing the True Potential of Semantic-based Log Parsing with Pre-trained Language ModelsArtifact-FunctionalArtifact-AvailableArtifact-Reusable
Research Track
Van-Hoang Le The University of Newcastle, Yi Xiao , Hongyu Zhang Chongqing University