SANER 2025
Tue 4 - Fri 7 March 2025 Montréal, Québec, Canada
Thu 6 Mar 2025 14:30 - 14:45 at M-1410 - Search & Similarity Chair(s): Fatemeh Hendijani Fard

In large-scale software systems, there are often no fully-fledged bug reports with human-written descriptions when an error occurs. In this case, developers rely on stack traces, i.e., series of function calls that led to the error. Since there can be tens and hundreds of thousands of them describing the same issue from different users, automatic deduplication into categories is necessary to allow for processing. Recent works have proposed powerful deep learning-based approaches for this, but they are evaluated and compared in isolation from real-life workflows, and it is not clear whether they will actually work well at scale.

To overcome this gap, this work presents three main contributions: a novel model, an industry-based dataset, and a multi-faceted evaluation. Our model consists of two parts - (1) an embedding model with byte-pair encoding and approximate nearest neighbor search to quickly find the most relevant stack traces to the incoming one, and (2) a reranker that re-ranks the most fitting stack traces, taking into account the repeated frames between them. To complement the existing datasets collected from open-source projects, we share with the community SlowOps - a dataset of stack traces from IntelliJ-based products developed by JetBrains, which has an order of magnitude more stack traces per category. Finally, we carry out an evaluation that strives to be realistic: measuring not only the accuracy of categorization, but also the operation time and the ability to create new categories. The evaluation shows that our model strikes a good balance - it outperforms other models on both open-source datasets and SlowOps, while also being faster on time than most. We release all of our code and data, and hope that our work can pave the way to further practice-oriented research in the area.

Thu 6 Mar

Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30
Search & SimilarityResearch Papers / Industrial Track at M-1410
Chair(s): Fatemeh Hendijani Fard University of British Columbia
14:00
15m
Talk
BinEGA: Enhancing DNN-based Binary Code Similarity Detection through Efficient Graph Alignment
Research Papers
Shize Zhou Zhejiang University, Lirong Fu Hangzhou Dianzi University, Peiyu Liu Zhejiang University, Wenhai Wang Zhejiang University
14:15
15m
Talk
Evaluating the Effectiveness and Efficiency of Demonstration Retrievers in RAG for Code Tasks
Research Papers
Pengfei He University of Manitoba, Shaowei Wang University of Manitoba, Shaiful Chowdhury University of Manitoba, Tse-Hsun (Peter) Chen Concordia University
14:30
15m
Talk
Stack Trace Deduplication: Faster, More Accurately, and in More Realistic Scenarios
Research Papers
Egor Shibaev Constructor University, JetBrains, Denis Sushentsev JetBrains, Yaroslav Golubev JetBrains Research, Aleksandr Khvorov JetBrains; Constructor University Bremen
Pre-print
14:45
15m
Talk
Industrial-Scale Neural Network Clone Detection with Disk-Based Similarity Search
Industrial Track
Gul Aftab Ahmed , Muslim Chochlov , Abdul Razzaq , James Vincent Patten , Yuanhua Han , Guoxian Lu , Jim Buckley Lero - The Irish Software Research Centre and University of Limerick, David Gregg Trinity College Dublin, Ireland