Improving Cross-Platform Binary Analysis using Representation Learning via Graph Alignment
Thu 21 Jul 2022 17:00 - 17:20 at ISSTA 2 - Session 3-6: Neural Networks, Learning, NLP F
Cross-platform binary analysis requires a common representation of binaries across platforms, on which a specific analysis can be performed. Recent work proposed to learn low-dimensional, numeric vector representations (i.e., embeddings) of disassembled binary code, and perform binary analysis in the embedding space. Unfortunately, however, existing techniques fall short in that they are either (i) specific to a single platform producing embeddings not aligned across platforms, or (ii) not designed to capture the rich contextual information available in a disassembled binary.
We present a novel deep learning-based method, XBA, which addresses the aforementioned problems. To this end, we first abstract binaries as typed graphs, dubbed binary disassembly graphs (BDGs), which encode control-flow and other rich contextual information of different entities found in a disassembled binary, including basic blocks, external functions called, and string literals referenced. We then formulate binary code representation learning as a graph alignment problem, i.e., finding the node correspondences between BDGs extracted from two binaries compiled for different platforms. XBA uses graph convolutional networks to learn the semantics of each node, (i) using its rich contextual information encoded in the BDG, and (ii) aligning its embeddings across platforms. Our formulation allows XBA to learn semantic alignments between two BDGs in a semi-supervised manner, requiring only a limited number of node pairs be aligned across platforms for training. Our evaluation shows that XBA can learn semantically-rich embeddings of binaries aligned across platforms without apriori platform-specific knowledge. By training our model only with 50% of the oracle alignments, XBA was able to predict, on average, 75% of the rest. Our case studies further show that the learned embeddings encode knowledge useful for cross-platform binary analysis.
Wed 20 JulDisplayed time zone: Seoul change
08:40 - 09:40 | |||
08:40 20mTalk | ASRTest: Automated Testing for Deep-Neural-Network-Driven Speech Recognition Systems Technical Papers Pin Ji Nanjing University, Yang Feng Nanjing University, Jia Liu Nanjing University, Zhihong Zhao Nanjing Tech Unniversity, Zhenyu Chen Nanjing University DOI | ||
09:00 20mTalk | BET: Black-box Efficient Testing for Convolutional Neural Networks Technical Papers Wang Jialai Tsinghua University, Han Qiu Tsinghua University, Yi Rong Tsinghua University, Hengkai Ye Purdue University, Qi Li Tsinghua University, Zongpeng Li Tsinghua University, Chao Zhang Tsinghua University DOI | ||
09:20 20mTalk | Improving Cross-Platform Binary Analysis using Representation Learning via Graph Alignment Technical Papers Geunwoo Kim University of California, Irvine, USA, Sanghyun Hong Oregon State University, Michael Franz University of California, Irvine, Dokyung Song Yonsei University, South Korea DOI |
Thu 21 JulDisplayed time zone: Seoul change
16:20 - 17:40 | |||
16:20 20mTalk | AEON: A Method for Automatic Evaluation of NLP Test Cases Technical Papers Jen-tse Huang The Chinese University of Hong Kong, Jianping Zhang The Chinese University of Hong Kong, Wenxuan Wang The Chinese University of Hong Kong, Pinjia He The Chinese University of Hong Kong, Shenzhen, Yuxin Su Sun Yat-sen University, Michael Lyu The Chinese University of Hong Kong DOI | ||
16:40 20mTalk | HybridRepair: Towards Annotation-Efficient Repair for Deep Learning Models Technical Papers DOI | ||
17:00 20mTalk | Improving Cross-Platform Binary Analysis using Representation Learning via Graph Alignment Technical Papers Geunwoo Kim University of California, Irvine, USA, Sanghyun Hong Oregon State University, Michael Franz University of California, Irvine, Dokyung Song Yonsei University, South Korea DOI | ||
17:20 20mTalk | Human-in-the-Loop Oracle Learning for Semantic Bugs in String Processing Programs Technical Papers Charaka Geethal Monash University, Thuan Pham The University of Melbourne, Aldeida Aleti Monash University, Marcel Böhme MPI-SP, Germany and Monash University, Australia DOI Pre-print |