Learning Heterogeneous Abstract Code Graph Representations For Program Comprehension
Program comprehension is a fundamental activity in the field of software engineering. However, efficiently and accurately understanding source code poses significant challenges, as source code with similar semantics can differ in syntax. Recent state-of-the-art research has demonstrated that combining deep learning techniques with structural information from source code, specifically AST-based static graphs, can enhance the extraction of essential features from source programs. Control flow and data flow information in source programs can express richer semantics while existing studies often overlook their heterogeneous integration when constructing program static graphs. This oversight results in the loss of information about the type of static graph edges, potentially impeding program comprehension.
In this paper, We model the source program by using a heterogeneous static graph and then use Relational Graph Convolutional Network (R-GCN) for feature extraction. Specifically, we present an innovative method for constructing a program static graph, termed the Heterogeneous Abstract Code Graph (HACG), and then we employ R-GCN to generate representations based on HACG for code classification and code clone detection. We evaluate our method using two extensive source code datasets: CodeNet, introduced by IBM, and BigCloneBench. The experimental results demonstrate the superiority of our approach over existing methods, achieving a code classification accuracy of 97.38% and an average F1-score of 98.34% in code clone detection.
Thu 5 DecDisplayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change
14:00 - 15:30 | Session (10)Technical Track / SEIP - Software Engineering in Practice at Room 3 (Xiangquan Ballroom) Chair(s): In-Young Ko Korea Advanced Institute of Science and Technology | ||
14:00 30mTalk | Why not Just Look For Answers? Using A More Direct Way for API Recommendation Technical Track Changxin Liu Chongqing University, Ling Xu School of Big Data & Software Engineering, Chongqing University, Wenhan Mu Chongqing University, Rui Qin Chongqing University | ||
14:30 30mTalk | Learning Heterogeneous Abstract Code Graph Representations For Program Comprehension Technical Track Shenning Song The College of Computer Science and Technology, Jilin University, Mengxi Zhang The College of Computer Science and Technology, Jilin University, Shaoquan Li The College of Computer Science and Technology, Jilin University, huaxiao liu The College of Computer Science and Technology, Jilin University | ||
15:00 20mTalk | CoSTV: Accelerating Code Search with Two-Stage Paradigm and Vector Retrieval SEIP - Software Engineering in Practice Dewu Zheng Sun yat-sen University, Yanlin Wang Sun Yat-sen University, Wenqing Chen Sun Yat-sen University, Jiachi Chen Sun Yat-sen University, Zibin Zheng Sun Yat-sen University |