MoEKD: Mixture-of-Experts Knowledge Distillation for Robust and High-Performing Compressed Code Models
This program is tentative and subject to change.
Large language models for code have achieved strong performance across diverse software analytics tasks, yet their real-world adoption remains limited by high computational demands, slow inference speeds, significant energy consumption, and environmental impact. Knowledge distillation (KD) offers a practical solution by transferring knowledge from a large model to a smaller and more efficient model. Despite its effectiveness, recent studies show that models distilled from a single source often exhibit degraded adversarial robustness, even when robustness-aware distillation techniques are employed. These observations suggest a fundamental limitation of single-source distillation in simultaneously transferring high-quality and robust knowledge. To overcome this limitation, we propose Mixture of Experts Knowledge Distillation (MoEKD), a KD framework that leverages a Mixture of Experts (MoE) architecture to enable more effective and robust knowledge transfer from multiple specialized experts into a compact model. MoEKD decomposes the distillation process into expert and router training, aggregation of expert knowledge through a learned routing mechanism, and distillation from the aggregated knowledge. We evaluate MoEKD on the vulnerability detection task using CodeBERT and GraphCodeBERT models. Experimental results show that MoEKD not only improves adversarial robustness by up to 35.8%, but also enhances predictive performance by up to 13%, compared to state-of-the-art KD baselines, including Compressor and AVATAR. Furthermore, an ablation study demonstrates that aggregating expert knowledge enables ultra-compact models to maintain competitive performance even when their size is reduced by approximately half. Overall, these results highlight the effectiveness of multi-expert knowledge aggregation in addressing key limitations of existing single-source KD approaches.
This program is tentative and subject to change.
Wed 10 JunDisplayed time zone: London change
15:30 - 17:00 | AI Systems Engineering 2Industry Papers / Research Papers at JMS 745 Chair(s): Jingyue Li Norwegian University of Science and Technology (NTNU) | ||
15:30 15mTalk | DeepParse: Hybrid Log Parsing with LLM-Synthesized Regex Masks Research Papers Pre-print | ||
15:45 15mTalk | MoEKD: Mixture-of-Experts Knowledge Distillation for Robust and High-Performing Compressed Code Models Research Papers Md. Abdul Awal University of Saskatchewan, Mrigank Rochan University of Saskatchewan, Chanchal K. Roy University of Saskatchewan Pre-print | ||
16:00 15mTalk | HKI-RAG:Hierarchical Knowledge Indexing for Retrieval-Augmented Generation in Distributed Heterogeneous Architectures Research Papers Chenglin Zhang School of Artificial Intelligence, China University ofGeosciences (Beijing), Teng Long School of Artificial Intelligence, China University of Geosciences (Beijing) | ||
16:15 15mTalk | PLMGH: What Matters in PLM-GNN Hybrids for Code Classification and Vulnerability Detection Research Papers Taoufik Kaouthar El Idrissi Polytechnique Montreal, Edward Zulkoski Quantstamp, Mohammad Hamdaqa Polytechnique Montreal | ||
16:30 10mTalk | Engineering a Governance-Aware AI Sandbox: Design, Implementation, and Lessons Learned Industry Papers Muhammad Waseem Faculty of Information Technology and Communication Sciences, Tampere University, 33014 Tampere, Finland, Md Aidul Islam Faculty of Information Technology and CommunicationSciences, Tampere University, 33014 Tampere, Finland, Md Nasir Uddin Shuvo Faculty of Information Technology and CommunicationSciences, Tampere University, 33014 Tampere, Finland, Md Mahade Hasan Tampere University, Kai-Kristian Kemell Tampere University, Jussi Rasku Tampere University, Mika Saari Tampere University, Vilma Saari DIMECC Oy., Tampere, Finland, Roope Pajasmaa DIMECC Oy., Tampere, Finland, Markku Oivo DIMECC Oy., Tampere, Finland, Pekka Abrahamsson Tampere University | ||
16:40 15mTalk | Industry Practitioners’ Perspectives on AI Model Quality: Perceptions, Challenges, and Solutions Research Papers Chenyu Wang Singapore Management University, Zhou Yang University of Alberta, Alberta Machine Intelligence Institute , Yunbo Lyu Singapore Management University, Ze Shi (Zane) Li University of Oklahoma, Dana Damian University of Victoria, David Lo Singapore Management University Pre-print | ||