BinEGA: Enhancing DNN-based Binary Code Similarity Detection through Efficient Graph Alignment (SANER 2025 - Research Papers)

Who

Shize Zhou, Lirong Fu, Peiyu Liu, Wenhai Wang

Track

SANER 2025 Research Papers

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 6 Mar 2025 14:00 - 14:15 at M-1410 - Search & Similarity Chair(s): Fatemeh Hendijani Fard

Abstract

Binary Code Similarity Detection (BCSD) is essential in various binary code security applications, enabling tasks such as vulnerability identification, malware analysis, and detection of code plagiarism. With the growing adoption of deep neural networks (DNNs) in BCSD, there has been significant progress in the identification and classification of similar code segments. However, DNN-based BCSD approaches often suffer from high false positive rates, because DNNs inevitably map different binary functions with complex structures and semantics to similar low-dimensional embeddings.

To alleviate this issue, this paper introduces BinEGA, a novel graph alignment-based approach to enhance the accuracy of DNN-based BCSD approaches. The main idea of BinEGA is to employ a general and low-cost equivalence check through lightweight graph alignment, allowing for the identification and elimination of semantically deviating functions among the top-k candidates retrieved by DNN-based BCSD approaches. During the graph alignment process, we first obtain the node embeddings according to structure and attribute feature. Then we employs pairwise comparison of these node embeddings to filter the false positives because binary code compiled from the same source code always shares similar basic blocks. Our experimental results demonstrate that BinEGA effectively enhances the performance of various edge-cutting DNN-based BCSD approaches across diverse scenarios. For instance, BinEGA significantly enhances RECALL@10 in the cross-optimization scenario for state-of-the-art (SOTA) approaches, with an average improvement of 29.2% for BinaryAI and 33.5% for jTrans. Moreover, BinEGA achieves 88.9% reduction in execution time compared to other enhancement techniques. In summary, this work provides a robust, generalizable, and efficient solution to improve the reliability of BCSD tools in real-world applications.

Shize Zhou

Zhejiang University

China

Lirong Fu

Hangzhou Dianzi University

China

Peiyu Liu

Zhejiang University

China

Wenhai Wang

Zhejiang University

China

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 6 Mar
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30	Search & SimilarityResearch Papers / Industrial Track at M-1410 Chair(s): Fatemeh Hendijani Fard University of British Columbia

14:00 15m Talk		BinEGA: Enhancing DNN-based Binary Code Similarity Detection through Efficient Graph Alignment Research Papers Shize Zhou Zhejiang University, Lirong Fu Hangzhou Dianzi University, Peiyu Liu Zhejiang University, Wenhai Wang Zhejiang University
14:15 15m Talk		Evaluating the Effectiveness and Efficiency of Demonstration Retrievers in RAG for Code Tasks Research Papers Pengfei He University of Manitoba, Shaowei Wang University of Manitoba, Shaiful Chowdhury University of Manitoba, Tse-Hsun (Peter) Chen Concordia University
14:30 15m Talk		Stack Trace Deduplication: Faster, More Accurately, and in More Realistic Scenarios Research Papers Egor Shibaev Constructor University, JetBrains, Denis Sushentsev JetBrains, Yaroslav Golubev JetBrains Research, Aleksandr Khvorov JetBrains; Constructor University Bremen Pre-print
14:45 15m Talk		Industrial-Scale Neural Network Clone Detection with Disk-Based Similarity Search Industrial Track Gul Aftab Ahmed , Muslim Chochlov , Abdul Razzaq , James Vincent Patten , Yuanhua Han , Guoxian Lu , Jim Buckley Lero - The Irish Software Research Centre and University of Limerick, David Gregg Trinity College Dublin, Ireland