This program is tentative and subject to change.
Large Language Models for Code (Code LLMs) are increasingly employed in software development. However, studies have recently shown that these models are vulnerable to backdoor attacks: when a trigger (a specific input pattern) appears in the input, the backdoor will be activated and cause the model to generate malicious outputs desired by the attacker. Researchers have designed various triggers and demonstrated the feasibility of implanting backdoors by poisoning a fraction of the training data (known as data poisoning). Some basic conclusions have been made, such as backdoors becoming easier to implant when attackers modify more training data. However, existing research has not explored other factors influencing backdoor attacks on Code LLMs, such as training batch size, epoch number, and the broader design space for triggers, e.g., trigger length. To bridge this gap, we use the code summarization task as an example to perform a comprehensive empirical study that systematically investigates the factors affecting backdoor effectiveness and understands the extent of the threat posed by backdoor attacks on Code LLMs. Three categories of factors are considered: data, model, and inference, revealing findings overlooked in previous studies for practitioners to mitigate backdoor threats. For example, Code LLM developers can adopt higher batch sizes with fewer epochs appropriately. Users of code models can adjust inference parameters, such as using a higher temperature or a larger top-k, appropriately. Future backdoor defense can prioritize the inspection of rarer and longer tokens, since they are more effective if they are indeed triggers. Since these non-backdoor design factors can also greatly sway attack performance, future backdoor studies should fully report settings, control key factors, and systematically vary them across configurations. What’s more, we find that the prevailing consensus, that attacks are ineffective at extremely low poisoning rates, is incorrect. The absolute number of poisoned samples matters as well. Specifically, poisoning just 20 out of 454,451 samples (0.004% poisoning rate, far below the minimum setting of 0.1% considered in prior Code LLM backdoor attack studies) successfully implants backdoors! Moreover, the common defense is incapable of removing even a single poisoned sample from this poisoned dataset, highlighting the urgent need for defense mechanisms against extremely low poisoning rate settings.
This program is tentative and subject to change.
Wed 19 NovDisplayed time zone: Seoul change
14:00 - 15:30 | |||
14:00 10mTalk | Advancing Binary Code Similarity Detection via Context-Content Fusion and LLM Verification Research Papers Chaopeng Dong Institute of Information Engineering, CAS, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China;, Jingdong Guo Institute of Information Engineering, CAS, Beijing, China; School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China;, Shouguo Yang Zhongguancun Laboratory, Beijing, China, Yi Li Nanyang Technological University, Dongliang Fang Beijing Key Laboratory of IOT Information Security Technology, Institute of Information Engineering, CAS, China; School of Cyber Security, University of Chinese Academy of Sciences, China, Yang Xiao Chinese Academy of Sciences, Yongle Chen Taiyuan University of Technology, China, Limin Sun Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences | ||
14:10 10mTalk | ACTaint: Agent-Based Taint Analysis for Access Control Vulnerabilities in Smart Contracts Research Papers Huarui Lin Zhejiang University, Zhipeng Gao Shanghai Institute for Advanced Study - Zhejiang University, Jiachi Chen Sun Yat-sen University, Xiang Chen Nantong University, Xiaohu Yang Zhejiang University, Lingfeng Bao Zhejiang University | ||
14:20 10mTalk | AMPLE: Fine-grained File Access Policies for Server Applications Research Papers | ||
14:30 10mTalk | Mockingbird: Efficient Excessive Data Exposures Detection via Dynamic Code Instrumentation Research Papers Chenxiao Xia Beijing Institute of Technology, Jiazheng Sun Fudan University, Jun Zheng Beijing Institute of Technology, Yu-an Tan Beijing Institute of Technology, Hongyi Su Beijing Institute of Technology | ||
14:40 10mTalk | DrainCode: Stealthy Energy Consumption Attacks on Retrieval-Augmented Code Generation via Context Poisoning Research Papers Jiadong Wu School of Software Engineering, Sun Yat-sen University, Yanlin Wang Sun Yat-sen University, Tianyue Jiang Sun Yat-sen University, Mingwei Liu Sun Yat-Sen University, Jiachi Chen Sun Yat-sen University, Chong Wang Nanyang Technological University, Ensheng Shi Huawei, Xilin Liu Huawei Cloud, Yuchi Ma Huawei Cloud Computing Technologies, Hongyu Zhang Chongqing University, Zibin Zheng Sun Yat-sen University | ||
14:50 10mTalk | Finding Insecure State Dependency in DApps via Multi-Source Tracing and Semantic Enrichment Research Papers Jingwen Zhang School of Software Engineering, Sun Yat sen University, Yuhong Nan Sun Yat-sen University, Wei Li School of Software Engineering, Sun Yat sen University, Kaiwen Ning Sun Yat-sen University, Zewei Lin Sun Yat-sen University, Zitong Yao School of Software Engineering, Sun Yat sen University, Yuming Feng Peng Cheng Laboratory, Weizhe Zhang Harbin Institute of Technology, Zibin Zheng Sun Yat-sen University | ||
15:00 10mTalk | Better Safe than Sorry: Preventing Policy Violations through Predictive Root-Cause-Analysis for IoT Systems Research Papers Michael Norris Penn State University, Syed Rafiul Hussain Pennsylvania State University, Gang (Gary) Tan Pennsylvania State University | ||
15:10 10mTalk | Backdoors in Code Summarizers: How Bad Is It? Research Papers Chenyu Wang Singapore Management University, Zhou Yang University of Alberta, Alberta Machine Intelligence Institute , Yaniv Harel Tel Aviv University, David Lo Singapore Management University Pre-print | ||
15:20 10mTalk | ProfMal: Detecting Malicious NPM Packages by the Synergy between Static and Dynamic Analysis Research Papers Yiheng Huang Fudan University, Wen Zheng Fudan University, Susheng Wu Fudan University, Bihuan Chen Fudan University, You Lu Fudan University, Zhuotong Zhou Fudan University, Yiheng Cao Fudan University, Xiaoyu Li Fudan University, Xin Peng Fudan University | ||