Patch the Leak: Strengthening CodeLLMs Against Privacy Extraction Threats
CodeLLMs tend to memorize their training data and can reconstruct personal information (PI) when given specific prompts. Despite the application of privacy anonymization methods to remove PI in foundational LLMs, the previous experiments using state-of-the-art PI extraction attacks like CODEBREAKER and CodexLeaks on multiple open-source and commercial CodeLLMs demonstrate that such information cannot be fully eliminated. Furthermore, we found that commercial models exhibit significantly lower leakage rates (approximately 20% lower) compared to open-source models, and we hypothesize this is related to the stronger model alignment. Addressing the lack of effective defenses against PI extraction, we treat PI leakage as a form of misalignment and propose PI-ALIGN, a novel framework inspired by adversarial learning. PI-ALIGN pairs CodeLLMs with the CODEBREAKER attack framework as an adversarial dual model and leverages the optimized GRPO (Group Relative Policy Optimization) process to realign the model during fine-tuning. This approach is expected to enhance the model’s robustness against PI extraction attacks by adversarially training it against CODEBREAKER. We also outline our experimental evaluation framework to systematically validate PI-ALIGN’s effectiveness, aiming to provide insights into countering PI extraction attacks on CodeLLMs.
Sat 28 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
11:00 - 12:30 | Intelligence and PrivacyEXPRESS at Cosmos 3B Chair(s): Peng Di Ant Group & UNSW Sydney, Puzhuo Liu Ant Group & Tsinghua University | ||
11:00 20mTalk | Patch the Leak: Strengthening CodeLLMs Against Privacy Extraction Threats EXPRESS Yongjian Guo Tsinghua University & Ant Group, Wanlun Ma Swinburne University of Technology, Xi Xiao Tsinghua University, Sheng Wen Swinburne University of Technology, Peng Di Ant Group & UNSW Sydney, Xiaogang Zhu The University of Adelaide | ||
11:20 20mTalk | From Large Language Models to Adversarial Malware: How far are we EXPRESS Shuai He Huazhong University of Science and Technology, Hao Yan Huazhong University of Science and Technology, Wenke Li Huazhong University of Science and Technology, Sheng Hong Huazhong University of Science and Technology, Xiaowei Guo Huazhong University of Science and Technology, Xiaofan Liu Huazhong University of Science and Technology, Cai Fu Huazhong University of Science and Technology | ||
11:40 20mTalk | Towards Source Mapping for Zero-Knowledge Smart Contracts: Design and Preliminary Evaluation EXPRESS Pei Xu University of Technology Sydney, Yulei Sui University of New South Wales, Mark Staples Digital Finance CRC | ||
12:00 20mTalk | TestFlow: Advancing Mobile UI Testing through Multi-Step Reinforcement Learning EXPRESS Xiaoxuan Tang Ant Group, Xinfang Chen Ant Group, Dajun Chen Ant Group, Sheng Zhou Zhejiang University, Wei Jiang Ant Group, Yong Li Ant Group | ||
12:20 10mDay closing | Discussion and Conclusion EXPRESS |
Cosmos 3B is the second room in the Cosmos 3 wing.
When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.