On the Evaluation of Large Language Models in Multilingual Vulnerability Repair
This program is tentative and subject to change.
Various Deep Learning-based approaches with pre-trained language models have been proposed for automatically repairing software vulnerabilities. However, these approaches are limited to a specific programming language (C/C++). Recent advances in large language models (LLMs) offer language-agnostic capabilities and strong semantic understanding, exhibiting potential to overcome multilingual vulnerability limitation. Although some work has begun to explore LLM’s repair performance, their effectiveness is unsatisfactory. To address these limitations, we conducted a large-scale empirical study to investigate the performance of automated vulnerability repair approaches and state-of-the-art LLMs across seven programming languages. Results show GPT-4o, instruction-tuned with few-shot prompting, performs competitively against the leading approach, VulMaster. Additionally, the LLM-based approach shows superior performance in repairing unique vulnerabilities and is more likely to repair the most dangerous vulnerabilities. Instruction-tuned GPT-4o demonstrates strong generalization on vulnerabilities in previously unseen language, outperforming existing approaches. Analysis shows that Go consistently achieves the highest effectiveness across all model types, while C/C++ performs the worst. Based on findings, we discuss the promising of LLM on multilingual vulnerability repair and reasons behind LLM failed cases. This work takes the first look at repair approaches and LLMs across multiple languages, highlighting the promising future of adopting LLMs to multilingual vulnerability repair.
This program is tentative and subject to change.
Fri 17 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
11:00 - 12:30 | AI for Software Engineering 21Research Track / New Ideas and Emerging Results (NIER) / Journal-first Papers at Asia IV Chair(s): Rui Abreu Faculty of Engineering of the University of Porto, Portugal | ||
11:00 15mTalk | On the Evaluation of Large Language Models in Multilingual Vulnerability Repair Journal-first Papers Dong Wang Tianjin University, Junji Yu Tianjin University, Honglin Shu Kyushu University, Michael Fu The University of Melbourne, Kla Tantithamthavorn Monash University, Yasutaka Kamei Kyushu University, Junjie Chen Tianjin University | ||
11:15 15mTalk | Not All Input Helps: What Information Should We Feed to LLMs for Vulnerability Repair? New Ideas and Emerging Results (NIER) | ||
11:30 15mTalk | EMC: A Semantic-Enhanced Malware Classification Method with Robustness and Scalability Research Track Haojun Zhao Huazhong University of Science and Technology, Yueming Wu Huazhong University of Science and Technology, Zhen Li Huazhong University of Science and Technology, Deqing Zou Huazhong University of Science and Technology | ||
11:45 15mTalk | When AI Takes the Wheel: Security Analysis of Framework-Constrained Program Generation Research Track Yue Liu Monash University, Zhenchang Xing CSIRO's Data61, Shidong Pan Columbia University & New York University, Kla Tantithamthavorn Monash University Pre-print | ||
12:00 15mTalk | Software Vulnerability Management in the Era of Artificial Intelligence: An Industry Perspective Research Track M. Mehdi Kholoosi Adelaide University, Triet Le Adelaide University, Muhammad Ali Babar School of Computer Science, The University of Adelaide Pre-print | ||
12:15 15mTalk | Towards Scalable and Interpretable Mobile App Risk Analysis via Large Language Models Research Track Yu Yang Zhejiang University, Zhenyuan Li Zhejiang University, Xiandong Ran Huawei Technologies Co., Ltd., Jiahao Liu National University of Singapore, Jiahui Wang Zhejiang University, Bo Yu National University of Defense Technology, Shouling Ji Zhejiang University Media Attached File Attached | ||