Attribution-guided Adversarial Code Prompt Generation for Code Completion Models (ASE 2024 - Research Papers)

Who

Xueyang Li, Guozhu Meng, Shangqing Liu, Lu Xiang, Kun Sun, Kai Chen, Xiapu Luo, Yang Liu

Track

ASE 2024 Research Papers

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 31 Oct 2024 15:30 - 15:45 at Compagno - Code completion Chair(s): Baishakhi Ray

Abstract

Large language models have made significant progress in code completion, which may further remodel future software development. However, these code completion models are found to be highly risky as they may introduce vulnerabilities unintentionally, or being induced by a special input, i.e., adversarial code prompt. Prior studies mainly focus on the robustness of these models, but their security has not been fully analyzed. In this paper, we propose a novel approach ADVPRO that can automatically generate adversarial code prompts for these code completion models. ADVPRO incorporates 14 code mutation strategies at the granularity of five levels. The mutation strategies are ensured to make no modifications to code semantics, which should be insensitive to the models. Moreover, we leverage gradient attribution to localize the important code as mutation points, and speed up adversarial prompt generation. Extensive experiments are conducted on 13 state-of-the-art models belonging to 7 families. The results show that our approach can effectively generate adversarial prompts, with an increased rate of 69.6% beyond the baseline ALERT. By comparing the results of attribution-guided localization, we find that the recognition results of important tokens in input codes are almost identical among different models. This finding reduces the limitation of using open-source alternative models to guide adversarial attacks against closed-source models. The results of the ablation study on the components of ADVPRO show that CCMs focus on variable names, but other structures are equally crucial.

Xueyang Li

Institute of Information Engineering, Chinese Academy of Sciences, China

Guozhu Meng

Institute of Information Engineering, Chinese Academy of Sciences

China

Shangqing Liu

Nanyang Technological University

Lu Xiang

SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, China

Kun Sun

Institute of Information Engineering, Chinese Academy of Sciences

Kai Chen

Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences

China

Xiapu Luo

Hong Kong Polytechnic University

China

Yang Liu

Nanyang Technological University

Singapore

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 31 Oct
Displayed time zone: Pacific Time (US & Canada) change

15:30 - 16:30	Code completionResearch Papers / NIER Track at Compagno Chair(s): Baishakhi Ray Columbia University, New York; AWS AI Lab

15:30 15m Talk		Attribution-guided Adversarial Code Prompt Generation for Code Completion Models Research Papers Xueyang Li Institute of Information Engineering, Chinese Academy of Sciences, China, Guozhu Meng Institute of Information Engineering, Chinese Academy of Sciences, Shangqing Liu Nanyang Technological University, Lu Xiang SKLOIS, Institute of Information Engineering, Chinese Academy of Sciences, China, Kun Sun Institute of Information Engineering, Chinese Academy of Sciences, Kai Chen Institute of Information Engineering at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Xiapu Luo Hong Kong Polytechnic University, Yang Liu Nanyang Technological University
15:45 15m Talk		DroidCoder: Enhanced Android Code Completion with Context-Enriched Retrieval-Augmented Generation Research Papers Xinran Yu Nanjing University, Chun Li Nanjing University, Minxue Pan Nanjing University, Xuandong Li Nanjing University
16:00 15m Talk		GraphCoder: Enhancing Repository-Level Code Completion via Coarse-to-fine Retrieval Based on Code Context Graph Research Papers Wei Liu Nanjing University, Ailun Yu Peking University, Daoguang Zan Institute of Software, Chinese Academy of Sciences, Bo Shen Huawei Cloud Computing Technologies Co., Ltd., Wei Zhang Peking University, Haiyan Zhao Peking University, Zhi Jin Peking University, Qianxiang Wang Huawei Technologies Co., Ltd
16:15 10m Talk		RepoSim: Evaluating Prompt Strategies for Code Completion via User Behavior Simulation NIER Track Chao Peng ByteDance, Qinyun Wu Bytedance Ltd., Jiangchao Liu ByteDance, Jierui Liu ByteDance, Bo Jiang Bytedance Network Technology, Mengqian Xu East China Normal University, Yinghao Wang ByteDance, Xia Liu ByteDance, Ping Yang Bytedance Network Technology