Beyond PEFT: Layer-Wise Optimization for More Effective and Efficient Large Code Model Tuning
Large Code Models (LCMs) have demonstrated remarkable effectiveness across various code intelligence tasks. Supervised fine-tuning is essential to optimize their performance for specific downstream tasks. Compared with the traditional full-parameter fine-tuning (FFT) method, Parameter-Efficient Fine-Tuning (PEFT) methods can train LCMs with substantially reduced resource consumption and have gained widespread attention among researchers and practitioners. While existing studies have explored PEFT methods for code intelligence tasks, they have predominantly focused on a limited subset of scenarios, such as code generation with publicly available datasets, leading to constrained generalizability of the findings. To mitigate the limitation, we conduct a comprehensive study on exploring the effectiveness of the PEFT methods, which involves five code intelligence tasks containing both public and private data. Our extensive experiments reveal a considerable performance gap between PEFT methods and FFT, which is contrary to the findings of existing studies. We also find that this disparity is particularly pronounced in tasks involving private data.
To improve the tuning performance for LCMs while reducing resource utilization during training, we propose a Layer-Wise Optimization (\method) strategy in the paper. /method incrementally updates the parameters of each layer in the whole model architecture, without introducing any additional component and inference overhead. Experiments across five LCMs and five code intelligence tasks demonstrate that LWO trains LCMs more effectively and efficiently compared to previous PEFT methods, with significant improvements in tasks using private data. For instance, in the line-level code completion task using our private code repositories, LWO outperforms the state-of-the-art LoRA method by 22% and 12% in terms of accuracy and BLEU scores, respectively. Furthermore, \method can enable more efficient LCM tuning, reducing the training time by an average of 42.7% compared to LoRA.
Wed 25 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
11:00 - 12:30 | SE and AI 2Ideas, Visions and Reflections / Research Papers at Cosmos Hall Chair(s): Massimiliano Di Penta University of Sannio, Italy | ||
11:00 20mTalk | Beyond PEFT: Layer-Wise Optimization for More Effective and Efficient Large Code Model Tuning Research Papers Chaozheng Wang The Chinese University of Hong Kong, jiafeng University of Electronic Science and Technology of China, Shuzheng Gao Chinese University of Hong Kong, Cuiyun Gao Harbin Institute of Technology, Shenzhen, Li Zongjie Hong Kong University of Science and Technology, Ting Peng Tencent Inc., Hailiang Huang Tencent Inc., Yuetang Deng Tencent, Michael Lyu Chinese University of Hong Kong DOI | ||
11:20 20mTalk | Automated Trustworthiness Oracle Generation for Machine Learning Text Classifiers Research Papers Lam Nguyen Tung Monash University, Australia, Steven Cho The University of Auckland, New Zealand, Xiaoning Du Monash University, Neelofar Neelofar Royal Melbourne Institure of Techonlogy (RMIT), Valerio Terragni University of Auckland, Stefano Ruberto JRC European Commission, Aldeida Aleti Monash University DOI Media Attached File Attached | ||
11:40 20mTalk | A Causal Learning Framework for Enhancing Robustness of Source Code Models Research Papers Junyao Ye Huazhong University of Science and Technology, Zhen Li Huazhong University of Science and Technology, Xi Tang Huazhong University of Science and Technology, Deqing Zou Huazhong University of Science and Technology, Shouhuai Xu University of Colorado Colorado Springs, Qiang Weizhong Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology DOI | ||
12:00 20mTalk | Eliminating Backdoors in Neural Code Models for Secure Code Understanding Research Papers Weisong Sun Nanjing University, Yuchen Chen Nanjing University, Chunrong Fang Nanjing University, Yebo Feng Nanyang Technological University, Yuan Xiao Nanjing University, An Guo Nanjing University, Quanjun Zhang School of Computer Science and Engineering, Nanjing University of Science and Technology, Zhenyu Chen Nanjing University, Baowen Xu Nanjing University, Yang Liu Nanyang Technological University DOI | ||
12:20 10mTalk | Reduction Fusion for Optimized Distributed Data-Parallel Computations via Inverse Recomputation Ideas, Visions and Reflections Haoxiang Lin Microsoft Research, Yang Wang Microsoft Research Asia, Yanjie Gao Microsoft Research, Hongyu Zhang Chongqing University, Ming Wu Zero Gravity Labs, Mao Yang Microsoft Research DOI Pre-print |
This is the main event hall of Clarion Hotel, which will be used to host keynote talks and other plenary sessions. The FSE and ISSTA banquets will also happen in this room.
The room is just in front of the registration desk, on the other side of the main conference area. The large doors with numbers “1” and “2” provide access to the Cosmos Hall.