A Mocktail of Source Code Representations
Efficient representation of source code is essential for various software engineering tasks such as code search and code clone detection. One such technique for representing source code involves extracting paths from the AST and using a learning model to capture program properties. Code2vec is a commonly used path-based approach that uses an attention-based neural network to learn code embeddings which can then be used for various software engineering tasks. However, this approach uses only ASTs and does not leverage other graph structures such as Control Flow Graphs (CFG) and Program Dependency Graphs (PDG). Similarly, most recent approaches for representing source code still use AST and do not leverage semantic graph structures. Even though there exists an integrated graph approach (Code Property Graph) for representing source code, it has only been explored in the domain of software security. Moreover, it does not leverage the paths from the individual graphs. In our work, we extend the path-based approach code2vec to include semantic graphs, CFG, and PDG, along with AST, which is still largely unexplored in the domain of software engineering. We evaluate our approach on the task of MethodNaming using a custom C dataset of 730K methods collected from 16 C projects from GitHub. In comparison to code2vec, our approach improves the F1 Score by 11% on the full dataset and up to 100% with individual projects. We show that semantic features from the CFG and PDG paths are indeed helpful. We envision that looking at a mocktail of source code representations for various software engineering tasks can lay the foundation for a new line of research and a re-haul of existing research.
Tue 16 NovDisplayed time zone: Hobart change
12:00 - 13:00 | ProgrammingJournal-first Papers / Research Papers / NIER track at Kangaroo Chair(s): Amiangshu Bosu Wayne State University | ||
12:00 20mTalk | Detecting TensorFlow Program Bugs in Real-World Industrial Environment Research Papers Chen Liu , Jie Lu SKL Computer Architecture, ICT, CAS, Guangwei Li Institute of Computing Technology, Ting Yuan SKL Computer Architecture, ICT, CAS University of Chinese Academy of Sciences, China, Lian Li Institute of Computing Technology at Chinese Academy of Sciences, China, Feng Tan Alibaba Group, Jun Yang Alibaba Group, Liang You Alibaba Group, Jingling Xue UNSW Sydney Pre-print | ||
12:20 20mTalk | Why Do Developers Remove Lambda Expressions in Java? Research Papers Mingwei Zheng Huazhong University of Science and Technology, Jun Yang Huazhong University of Science and Technology, Ming Wen Huazhong University of Science and Technology, Hengcheng Zhu The Hong Kong University of Science and Technology, Yepang Liu Southern University of Science and Technology, Hai Jin Huazhong University of Science and Technology | ||
12:40 10mTalk | A Mocktail of Source Code Representations NIER track Dheeraj Vagavolu RISHA Lab, Indian Institute of Technology, Tirupati, Karthik Chandra Swarna RISHA Lab, Indian Institute of Technology Tirupati, Sridhar Chimalakonda RISHA Lab, Indian Institute of Technology, Tirupati | ||
12:50 10mTalk | On Tracking Java Methods with Git Mechanisms Journal-first Papers Yoshiki Higo Osaka University, Shinpei Hayashi Tokyo Institute of Technology, Shinji Kusumoto Osaka University |