Effective Vulnerability Detection over Code Token Graph: A GCN with Score Gate Based Approach
In modern society, software systems are integral to various aspects of life. Finding an efficient vulnerability identification approach is crucial for ensuring security and preventing malicious attacks. In recent years, many deep learning-based methods have shown outstanding performance in the vulnerability detection task. However, these methods still have limitations. Some methods consider code input as token sequences and apply architectures typically used in natural language processing. They fail to utilize the structural information from various code components’ interactions, which limits these models’ performance. Other methods based on graph neural networks, although better at learning structural information, treat each node equally and fail to emphasize key elements. To overcome these limitations, we propose CTGGSG, an effective vulnerability detection approach over Code Token Graph (CTG) based on GCN with score gate. In our model, we use PLE-CG-SE module to represent the source code samples as the CTGs, effectively utilizing the high-quality feature representation of PL-PLM (Pre-trained Language Model based on Program Languages) and retaining structural information from the source code. During the graph learning process, we combine GCN convolution and score gate mechanism to make the model focus more on the key nodes within the graph and increase the receptive field of the nodes. To comprehensively evaluate the performance and scalability of our model, we conducted experiments on two real-world datasets: CodeXGLUE, which contains balanced sample labels, and Reveal, which contains imbalanced sample labels. These datasets contain 27,318 and 22,734 function-level samples, respectively, derived from large-scale, popular real-world projects. Compared to existing advanced vulnerability detection methods, our model achieved state-of-the-art performance overall.
Wed 4 DecDisplayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change
14:00 - 15:30 | |||
14:00 30mTalk | Unraveling the Potential of Large Language Models in Code Translation: How Far Are We? Technical Track Qingxiao Tao School of Software, Shanghai Jiao Tong University, Shanghai, China, Tingrui Yu School of Software, Shanghai Jiao Tong University, Shanghai, China, Xiaodong Gu Shanghai Jiao Tong University, Beijun Shen Shanghai Jiao Tong University | ||
14:30 30mTalk | Effective Vulnerability Detection over Code Token Graph: A GCN with Score Gate Based Approach Technical Track Nong Zou Southwest University, Nan Li Southwest University, Junxiang Zhang Southwest University, Xiaomeng Wang Southwest University, Hong Lai Southwest University, Tao Jia Southwest University | ||
15:00 30mTalk | Putting APIs in the Right Order with Gated Graph Neural Networks Technical Track |