Smaller but Better: Self-Paced Knowledge Distillation for Lightweight yet Effective LCMs (FSE 2025 - Research Papers)

Who

Yujia Chen, Yang Ye, Zhongqi Li, Yuchi Ma, Cuiyun Gao

Track

FSE 2025 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 24 Jun 2025 14:40 - 15:00 at Cosmos Hall - LLM for SE 2 Chair(s): Jialun Cao

Abstract

Large code models (LCMs) have remarkably advanced the field of code generation. Despite their impressive capabilities, they still face practical deployment issues, such as high inference costs, limited accessibility of proprietary LCMs, and adaptability issues of ultra-large LCMs. These issues highlight the critical need for more accessible, lightweight yet effective LCMs. Knowledge distillation (KD) offers a promising solution, which transfers the programming capabilities of larger, advanced LCMs (Teacher) to smaller, less powerful LCMs (Student). However, existing KD methods often lack consideration of fault knowledge and rely on static seed knowledge, which limits their effectiveness.

In this paper, we propose a novel Self-Paced knOwledge DistillAtion framework, named SODA, aims at developing lightweight yet Effective student LCMs via continually transferring the programming capabilities from advanced teacher LCMs. SODA consists of three stages in one cycle: (1) Correct-and-Fault Knowledge Delivery stage aims at improving the student model’s capability to recognize errors while ensuring its basic programming skill during the knowledge transferring, which involves correctness-aware supervised learning and fault-aware contrastive learning methods. (2) Multi-view Feedback stage aims at measuring the quality of results generated by the student model from two views, including model-based and static tool-based measurement; (3) Feedback-based Knowledge Update stage aims at updating the student model adaptively by generating new questions at different difficulty levels, in which the difficulty levels are categorized based on the feedback in the last stage. By performing the training cycle iteratively, the student model is continuously refined through learning more advanced programming skills from the teacher model. We compare SODA with four state-of-the-art KD approaches in the code generation task across seven programming languages. Experimental results show that SODA improves the student model by 65.96% in terms of average Pass@1, outperforming the best baseline PERsD by 29.85%. Based on the proposed SODA framework, we develop SodaCoder, a series of lightweight yet effective LCMs with less than 7B parameters, which outperform 15 LCMs with less than or equal to 16B parameters. Notably, SodaCoder-DS 6.7B, built on DeepseekCoder-6.7B, even surpasses the prominent ChatGPT on average Pass@1 across seven programming languages (66.4 vs. 61.3).

DOI

https://doi.org/10.1145/3729405

Yujia Chen

Harbin Institute of Technology, Shenzhen

China

Yang Ye

Huawei Cloud Computing Technologies Co., Ltd.

Zhongqi Li

Huawei Cloud Computing Technologies Co., Ltd.

Yuchi Ma

Huawei Cloud Computing Technologies

China

Cuiyun Gao

Harbin Institute of Technology, Shenzhen

China

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 24 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:00 - 15:30	LLM for SE 2Research Papers / Industry Papers / Ideas, Visions and Reflections at Cosmos Hall Chair(s): Jialun Cao Hong Kong University of Science and Technology

14:00 20m Talk		Migrating Code At Scale With LLMs At Google Industry Papers Celal Ziftci Google, Stoyan Nikolov Google, Inc., Anna Sjovall Google, Inc., Bo Kim Google, Daniele Codecasa Google, Inc., Max Kim Google
14:20 20m Talk		Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning Research Papers Yoav Alon University of Bristol, Cristina David University of Bristol DOI
14:40 20m Talk		Smaller but Better: Self-Paced Knowledge Distillation for Lightweight yet Effective LCMs Research Papers Yujia Chen Harbin Institute of Technology, Shenzhen, Yang Ye Huawei Cloud Computing Technologies Co., Ltd., Zhongqi Li Huawei Cloud Computing Technologies Co., Ltd., Yuchi Ma Huawei Cloud Computing Technologies, Cuiyun Gao Harbin Institute of Technology, Shenzhen DOI
15:00 10m Talk		Enabling Scalable Proactive Workspaces With Environment-Wide Context Ideas, Visions and Reflections Nick Bradley University of British Columbia, Thomas Fritz University of Zurich, Reid Holmes University of British Columbia
15:10 20m Talk		Bridging Operator Semantic Inconsistencies: A Source-level Cross-framework Model Conversion Approach Research Papers Xingpei Li National University of Defense Technology, China, Yan Lei Chongqing University, Zhouyang Jia National University of Defense Technology, Yuanliang Zhang National University of Defense Technology, Haoran Liu National University of Defense Technology, Liqian Chen National University of Defense Technology, Wei Dong National University of Defense Technology, Shanshan Li National University of Defense Technology DOI

Information for Participants

Tue 24 Jun 2025 14:00 - 15:30 at Cosmos Hall - LLM for SE 2 Chair(s): Jialun Cao

Info for room Cosmos Hall:

This is the main event hall of Clarion Hotel, which will be used to host keynote talks and other plenary sessions. The FSE and ISSTA banquets will also happen in this room.

The room is just in front of the registration desk, on the other side of the main conference area. The large doors with numbers “1” and “2” provide access to the Cosmos Hall.