A Study On C Code Defect Detection With Fine-tuned Large Language Models (APSEC 2024 - ERA - Early Research Achievements)

Who

Yue Wang, Xu Wang, Hongwei Yu, Fei Gao, Xueshi Liu, Xiaoling Wang

Track

APSEC 2024 ERA - Early Research Achievements

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 5 Dec 2024 15:00 - 15:20 at Room 2 (Xiangshan Ballroom) - Session (9) Chair(s): Zhiqiang Li

Abstract

Large Language Models(LLMs) have demonstrated excellent capabilities in many areas of software engineering(SE), including code completion, code generation, code understanding, code repair, etc., and the most prominent performer in this regard is ChatGPT. However, it is not open-source, which poses a challenge to the implementation of LLM-based code defect detection techniques. In this paper, we focus on low-cost-of-use, fine-tunable, open-source large language models with less than 10B parameters, and study their capabilities of C code defect detection when fine-tuned with real-world data and improved with prompt engineering. We studied LLaMa3-8B, DeepSeek-Coder-7b and Qwen2-7B, as they are the typical models with prompt capabilities, whose performance in SE is close to ChatGPT, and they are open-source models. Experimental results show that our method can significantly improve the performance of LLMs within 10B parameters on code defect detection, and the output of the models can be applied to several downstream tasks, such as improving the reports of static analysis tools, etc.

Yue Wang

Beihang University

China

Xu Wang

Beihang University

China

Hongwei Yu

Beihang University

China

Fei Gao

Beijing Aerospace Automatic Control Institute

China

Xueshi Liu

Beijing Aerospace Automatic Control Institute

China

Xiaoling Wang

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 5 Dec
Displayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change

14:00 - 15:30	Session (9)Technical Track / ERA - Early Research Achievements at Room 2 (Xiangshan Ballroom) Chair(s): Zhiqiang Li

14:00 30m Talk		Multi-Hierarchy Metamorphic Testing for Hyphenated Words in Machine Translation Technical Track Rui Zhu Nanjing University of Aeronautics and Astronautics, Chuanqi Tao Nanjing University of Aeronautics and Astronautics, Jerry Gao San Jose State University
14:30 30m Talk		Exploring the Depths of WebAudio: Advancing Greybox Fuzzing for Enhanced Vulnerability Detection in Safari Technical Track Jiashui Wang Zhejiang University, Jiahui Wang Zhejiang University, Jundong Xie Ant Group, Zhenyuan Li Zhejiang University, Yan Chen Northwestern University, Peng Qian Zhejiang University
15:00 20m Talk		A Study On C Code Defect Detection With Fine-tuned Large Language Models ERA - Early Research Achievements Yue Wang Beihang University, Xu Wang Beihang University, Hongwei Yu Beihang University, Fei Gao Beijing Aerospace Automatic Control Institute, Xueshi Liu Beijing Aerospace Automatic Control Institute, Xiaoling Wang