A Study On C Code Defect Detection With Fine-tuned Large Language Models
Large Language Models(LLMs) have demonstrated excellent capabilities in many areas of software engineering(SE), including code completion, code generation, code understanding, code repair, etc., and the most prominent performer in this regard is ChatGPT. However, it is not open-source, which poses a challenge to the implementation of LLM-based code defect detection techniques. In this paper, we focus on low-cost-of-use, fine-tunable, open-source large language models with less than 10B parameters, and study their capabilities of C code defect detection when fine-tuned with real-world data and improved with prompt engineering. We studied LLaMa3-8B, DeepSeek-Coder-7b and Qwen2-7B, as they are the typical models with prompt capabilities, whose performance in SE is close to ChatGPT, and they are open-source models. Experimental results show that our method can significantly improve the performance of LLMs within 10B parameters on code defect detection, and the output of the models can be applied to several downstream tasks, such as improving the reports of static analysis tools, etc.
Thu 5 DecDisplayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change
14:00 - 15:30 | Session (9)Technical Track / ERA - Early Research Achievements at Room 2 (Xiangshan Ballroom) Chair(s): Zhiqiang Li | ||
14:00 30mTalk | Multi-Hierarchy Metamorphic Testing for Hyphenated Words in Machine Translation Technical Track Rui Zhu Nanjing University of Aeronautics and Astronautics, Chuanqi Tao Nanjing University of Aeronautics and Astronautics, Jerry Gao San Jose State University | ||
14:30 30mTalk | Exploring the Depths of WebAudio: Advancing Greybox Fuzzing for Enhanced Vulnerability Detection in Safari Technical Track Jiashui Wang Zhejiang University, Jiahui Wang Zhejiang University, Jundong Xie Ant Group, Zhenyuan Li Zhejiang University, Yan Chen Northwestern University, Peng Qian Zhejiang University | ||
15:00 20mTalk | A Study On C Code Defect Detection With Fine-tuned Large Language Models ERA - Early Research Achievements Yue Wang Beihang University, Xu Wang Beihang University, Hongwei Yu Beihang University, Fei Gao Beijing Aerospace Automatic Control Institute, Xueshi Liu Beijing Aerospace Automatic Control Institute, Xiaoling Wang |