Smaller but Better: Self-Paced Knowledge Distillation for Lightweight yet Effective LCMs
Large code models (LCMs) have remarkably advanced the field of code generation. Despite their impressive capabilities, they still face practical deployment issues, such as high inference costs, limited accessibility of proprietary LCMs, and adaptability issues of ultra-large LCMs. These issues highlight the critical need for more accessible, lightweight yet effective LCMs. Knowledge distillation (KD) offers a promising solution, which transfers the programming capabilities of larger, advanced LCMs (Teacher) to smaller, less powerful LCMs (Student). However, existing KD methods often lack consideration of fault knowledge and rely on static seed knowledge, which limits their effectiveness.
In this paper, we propose a novel Self-Paced knOwledge DistillAtion framework, named SODA, aims at developing lightweight yet Effective student LCMs via continually transferring the programming capabilities from advanced teacher LCMs. SODA consists of three stages in one cycle: (1) Correct-and-Fault Knowledge Delivery stage aims at improving the student model’s capability to recognize errors while ensuring its basic programming skill during the knowledge transferring, which involves correctness-aware supervised learning and fault-aware contrastive learning methods. (2) Multi-view Feedback stage aims at measuring the quality of results generated by the student model from two views, including model-based and static tool-based measurement; (3) Feedback-based Knowledge Update stage aims at updating the student model adaptively by generating new questions at different difficulty levels, in which the difficulty levels are categorized based on the feedback in the last stage. By performing the training cycle iteratively, the student model is continuously refined through learning more advanced programming skills from the teacher model. We compare SODA with four state-of-the-art KD approaches in the code generation task across seven programming languages. Experimental results show that SODA improves the student model by 65.96% in terms of average Pass@1, outperforming the best baseline PERsD by 29.85%. Based on the proposed SODA framework, we develop SodaCoder, a series of lightweight yet effective LCMs with less than 7B parameters, which outperform 15 LCMs with less than or equal to 16B parameters. Notably, SodaCoder-DS 6.7B, built on DeepseekCoder-6.7B, even surpasses the prominent ChatGPT on average Pass@1 across seven programming languages (66.4 vs. 61.3).
Tue 24 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
14:00 - 15:30 | LLM for SE 2Research Papers / Industry Papers / Ideas, Visions and Reflections at Cosmos Hall Chair(s): Jialun Cao Hong Kong University of Science and Technology | ||
14:00 20mTalk | Migrating Code At Scale With LLMs At Google Industry Papers Celal Ziftci Google, Stoyan Nikolov Google, Inc., Anna Sjovall Google, Inc., Bo Kim Google, Daniele Codecasa Google, Inc., Max Kim Google | ||
14:20 20mTalk | Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning Research Papers DOI | ||
14:40 20mTalk | Smaller but Better: Self-Paced Knowledge Distillation for Lightweight yet Effective LCMs Research Papers Yujia Chen Harbin Institute of Technology, Shenzhen, Yang Ye Huawei Cloud Computing Technologies Co., Ltd., Zhongqi Li Huawei Cloud Computing Technologies Co., Ltd., Yuchi Ma Huawei Cloud Computing Technologies, Cuiyun Gao Harbin Institute of Technology, Shenzhen DOI | ||
15:00 10mTalk | Enabling Scalable Proactive Workspaces With Environment-Wide Context Ideas, Visions and Reflections Nick Bradley University of British Columbia, Thomas Fritz University of Zurich, Reid Holmes University of British Columbia | ||
15:10 20mTalk | Bridging Operator Semantic Inconsistencies: A Source-level Cross-framework Model Conversion Approach Research Papers Xingpei Li National University of Defense Technology, China, Yan Lei Chongqing University, Zhouyang Jia National University of Defense Technology, Yuanliang Zhang National University of Defense Technology, Haoran Liu National University of Defense Technology, Liqian Chen National University of Defense Technology, Wei Dong National University of Defense Technology, Shanshan Li National University of Defense Technology DOI |
This is the main event hall of Clarion Hotel, which will be used to host keynote talks and other plenary sessions. The FSE and ISSTA banquets will also happen in this room.
The room is just in front of the registration desk, on the other side of the main conference area. The large doors with numbers “1” and “2” provide access to the Cosmos Hall.