Exploring GNN Based Program Embedding Technologies for Binary related Tasks
With the rapid growth of program scale, program analysis, maintenance and optimization become increasingly diverse and complex. Applying learning-assisted methodologies onto program analysis has attracted ever-increasing attention. However, a large number of program factors including syntax structures, semantics, running platforms and compilation configurations block the effective realization of these methods. To overcome these obstacles, existing works prefer to be on a basis of source code or abstract syntax tree, but unfortunately are sub-optimal for binary-oriented analysis tasks closely related to the compilation process. To this end, we propose a new program analysis approach that aims at solving program-level and procedure-level tasks with one model, by taking advantage of the great power of graph neural networks from the level of binary code. By fusing the semantics of control flow graphs, data flow graphs and call graphs into one model, and embedding instructions and values simultaneously, our method can effectively work around emerging compilation-related problems. By testing the proposed method on two tasks, binary similarity detection and dead store prediction, the results show that our method is able to achieve as high accuracy as 83.25%, and 82.77%.
Mon 16 MayDisplayed time zone: Eastern Time (US & Canada) change
21:00 - 21:50 | Session 9: Program Representation 2Research at ICPC room Chair(s): Lingxiao Jiang Singapore Management University | ||
21:00 7mTalk | HELoC: Hierarchical Contrastive Learning of Source Code Representation Research Xiao Wang Shandong Normal University, Qiong Wu Shandong Normal University, Hongyu Zhang University of Newcastle, Chen Lyu Shandong Normal University, Xue Jiang Shandong Normal University, Zhuoran Zheng Nanjing University of Science and Technology, Lei Lyu Shandong Normal University, Songlin Hu Institute of Information Engineering, Chinese Academy of Sciences Media Attached | ||
21:07 7mTalk | Exploring GNN Based Program Embedding Technologies for Binary related Tasks Research YixinGuo Peking University, Pengcheng Li Google, Inc, Yingwei Luo Peking University, Xiaolin Wang Peking University, Zhenlin Wang Michigan Technological University Media Attached | ||
21:14 7mTalk | Learning Heterogeneous Type Information in Program Graphs Research Kechi Zhang Peking University, Wenhan Wang Nanyang Technological University, Huangzhao Zhang Peking University, Ge Li Peking University, Zhi Jin Peking University DOI Pre-print Media Attached | ||
21:21 7mTalk | Unified Abstract Syntax Tree Representation Learning for Cross-language Program Classification Research Kesu Wang Nanjing University, Meng Yan Chongqing University, He Zhang Nanjing University, Haibo Hu Chongqing University Media Attached | ||
21:28 7mTalk | On the Transferability of Pre-trained Language Models for Low-Resource Programming Languages Research Fuxiang Chen University of British Columbia, Fatemeh Hendijani Fard University of British Columbia, David Lo Singapore Management University, Timofey Bryksin JetBrains Research; HSE University Pre-print Media Attached | ||
21:35 15mLive Q&A | Q&A-Paper Session 9 Research |