ICSE 2026
Sun 12 - Sat 18 April 2026 Rio de Janeiro, Brazil
Thu 16 Apr 2026 17:00 - 17:15 at Oceania IX - AI for Software Engineering 19 Chair(s): Fabio Palomba

Fine-tuning plays a crucial role in adapting large code models (LCMs) to specific software engineering tasks. However, fine-tuning LCMs requires perfectly labeled datasets, which are rarely available in practice. Noisy labels in the training data can significantly impair the generalization ability and overall performance of fine-tuned LCMs. Previous work has primarily focused on the problem of noisy labels in training models from scratch, while this problem remains largely unexplored in the context of fine-tuning LCMs.

To fill this gap, this paper proposes RobustFT, the first approach for fine-tuning LCMs in the presence of noisy labels. The core of RobustFT is to distinguish noisy labels from clean ones based on training dynamics observed during the fine-tuning process. Our insight is that, during fine-tuning, the trajectories of mislabeled samples in the latent feature space are significantly longer than those of clean samples. After filtering out noisy labels, RobustFT restarts the fine-tuning process using only the selected clean samples, thus producing a more effective LCM. We evaluate RobustFT on 36 diverse subjects, covering multiple LCMs, code datasets, and varying types and ratios of noisy labels. The results show that RobustFT outperforms five baselines in both identifying noisy labels and enhancing the fine-tuning effectiveness of LCMs.

Thu 16 Apr

Displayed time zone: Brasilia, Distrito Federal, Brazil change

16:00 - 17:30
AI for Software Engineering 19Research Track at Oceania IX
Chair(s): Fabio Palomba University of Salerno
16:00
15m
Talk
An Eye for AI: Eye-Tracking the Micro-Interruptions of GenAI Code SuggestionsArtifact Award Winner
Research Track
Tarek Alakmeh University of Zurich, Sarah D'Angelo Google, Thomas Fritz University of Zurich
Pre-print Media Attached
16:15
15m
Talk
Inside Out: Uncovering How Comment Internalization Steers LLMs for Better or WorseVirtual Attendance
Research Track
Aaron Imani University of California, Irvine, Mohammad Moshirpour University of California, Irvine, Iftekhar Ahmed University of California at Irvine
Pre-print Media Attached
16:30
15m
Talk
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning
Research Track
Zhaoyang Chu Huazhong University of Science and Technology, Yao Wan Huazhong University of Science and Technology, Zhikun Zhang Zhejiang University, Di Wang King Abdullah University of Science and Technology, Zhou Yang University of Alberta, Alberta Machine Intelligence Institute , Hongyu Zhang Chongqing University, Pan Zhou Huazhong University of Science and Technology, Xuanhua Shi Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology, David Lo Singapore Management University
Pre-print
16:45
15m
Talk
What Makes Code Generation Ethically Sourced?Distinguished Paper Award
Research Track
Zhuolin Xu Concordia University, Chenglin Li Concordia University, Qiushi Li Concordia University, Shin Hwei Tan Concordia University
17:00
15m
Talk
Filtering before Tuning: Robust Fine-Tuning of Large Code Models under Noisy Labels
Research Track
Zhong Li Nanjing University, Yang Chen China Automobile Data of Tianjin Co., Ltd. China Automotive Technology&Research Center Co.,Ltd., Heng Yong Nanjing University, Yuanyi Lin Huawei Technologies, Jiali Zhao Huawei, Tongtong Xu Huawei, Minxue Pan Nanjing University, Tian Zhang Nanjing University, Xuandong Li Nanjing University
17:15
15m
Talk
Automating Requirements Formalization: Using LLMs and Low-Complexity Distinguishing Traces for Semantic Validation
Research Track
Daniel Mendoza Stanford University, Anastasia Mavridou KBR / NASA Ames Research Center, Andreas Katis KBR / NASA Ames Research Center, Caroline Trippel Stanford University