Retrieve, Refine, or Both? Using Task-Specific Guidelines for Secure Python Code Generation
This program is tentative and subject to change.
Large Language Models (LLMs) are increasingly used for code generation, but they often produce code with security vulnerabilities. While techniques like fine-tuning and instruction tuning can improve security, they are computationally expensive and require large amounts of secure code data. Recent studies have explored prompting techniques to enhance code security without additional training. Among these, Recursive Criticism and Improvement (RCI) has demonstrated strong improvements by iteratively refining the generated code by leveraging LLMs’ self-critiquing capabilities. However, RCI relies on the model’s ability to identify security flaws, which is constrained by its training data and susceptibility to hallucinations.
This paper investigates the impact of incorporating task-specific secure coding guidelines extracted from MITRE’s CWE and CodeQL recommendations into LLM prompts. We employ Retrieval-Augmented Generation (RAG) to dynamically retrieve the relevant guidelines that help the LLM avoid generating insecure code. We compare RAG with RCI, observing that both deliver comparable performance in terms of code security, with RAG consuming considerably less time and fewer tokens. Additionally, combining both approaches further reduces the amount of insecure code generated, requiring only slightly more resources than RCI alone, highlighting the benefit of adding relevant guidelines in improving LLM-generated code security.
This program is tentative and subject to change.
Thu 11 SepDisplayed time zone: Auckland, Wellington change
15:30 - 17:00 | Session 12 - Security 1NIER Track / Research Papers Track / Tool Demonstration Track / Journal First Track at Room TBD2 | ||
15:30 15m | Retrieve, Refine, or Both? Using Task-Specific Guidelines for Secure Python Code Generation Research Papers Track Catherine Tony Hamburg University of Technology, Emanuele Iannone Hamburg University of Technology, Riccardo Scandariato Hamburg University of Technology Pre-print | ||
15:45 15m | SAEL: Leveraging Large Language Models with Adaptive Mixture-of-Experts for Smart Contract Vulnerability Detection Research Papers Track Lei Yu Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences, China, Shiqi Cheng Institute of Software, Chinese Academy of Sciences, China, Zhirong Huang Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences, China, Jingyuan Zhang Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences, China, Chenjie Shen Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences, China, Junyi Lu Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences, China, Li Yang Institute of Software, Chinese Academy of Sciences, Fengjun Zhang Institute of Software, Chinese Academy of Sciences, China, Jiajia Ma Institute of Software, Chinese Academy of Sciences, China | ||
16:00 15m | Evaluating the maintainability of Forward-Porting vulnerabilities in fuzzer benchmarks Research Papers Track Timothée Riom Umeå Universitet, Sabine Houy Umeå Universitet, Bruno Kreyssig Umeå University, Alexandre Bartel Umeå University | ||
16:15 10m | VulGuard: An Unified Tool for Evaluating Just-In-Time Vulnerability Prediction Models Tool Demonstration Track Duong Nguyen Hanoi University of Science and Technology, Manh Tran-Duc Hanoi University of Science and Technology, Le-Cong Thanh The University of Melbourne, Triet Le The University of Adelaide, Muhammad Ali Babar School of Computer Science, The University of Adelaide, Quyet Thang Huynh Hanoi University of Science and Technology | ||
16:25 10m | Explicit Vulnerability Generation with LLMs: An Investigation Beyond Adversarial Attacks NIER Track Emir Bosnak Bilkent University, Sahand Moslemi Yengejeh Bilkent University, Mayasah Lami Bilkent University, Anil Koyuncu Bilkent University Pre-print | ||
16:35 10m | Detecting Adversarial Prompted AI-Generated Code on Stack Overflow: A Benchmark Dataset and an Enhanced Detection Approach. NIER Track Aman Swaraj Dept. of Computer Science & Engineering, Indian Institute of Technology, Roorkee, India, Krishna Agarwal Dept. of Computer Science & Engineering, Indian Institute of Technology, Roorkee, India, Atharv Joshi Indian Institute of Technology Roorkee, Sandeep Kumar Dept. of Computer Science & Engineering, Indian Institute of Technology, Roorkee, India | ||
16:45 15m | Vulnerabilities in Infrastructure as Code: What, How Many, and Who? Journal First Track Aïcha War University of Luxembourg, Alioune Diallo University of Luxembourg, Andrew Habib ABB Corporate Research, Germany, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg |