SANER 2025
Tue 4 - Fri 7 March 2025 Montréal, Québec, Canada
Thu 6 Mar 2025 11:15 - 11:30 at L-1720 - Software Analysis & Recommendation Systems Chair(s): Brittany Reid

Log parsing has been long studied by researchers, given its high importance in the software engineering community: the process identifies dynamic variables and constructs log templates with static components. Prior work has proposed many statistic-based log parsers (e.g., Drain), which are highly efficient; they, unfortunately, met the bottleneck of parsing performance in comparison to semantic-based log parsers, which require labeling and more computational resources. In the meanwhile, we noticed that previous works on log parsing mainly focused on the parsing stage and usually used an ad hoc preprocessing step (e.g., masking numbers or IP addresses). However, we argue that both preprocessing and parsing are essential for log parsers to identify dynamic variables. The lack of understanding of log preprocessing may prevent the optimal use of log parsers and hinder future research in developing parsing algorithms and configuring their preprocessing step. Therefore, our work first studied the existing approaches for log preprocessing, in particular, by analyzing the existing preprocessing steps used for different log datasets provided in Loghub, a popular log parsing benchmark. We then developed preprocessing framework based on our findings and evaluated its impact on log parsing. According to our experiment, our preprocessing framework can significantly boost the overall performance of the four state-of- the-art statistic-based parsers examined in the study. The best statistic-based log parser, Drain, obtained improvement on all four parsing metrics (e.g., the F1 score of template accuracy, FTA, increases by 108.9%). Moreover, in comparison to the optimal semantic-based log parsers, it obtained a 28.3% improvement in grouping accuracy (GA), 38.1% enhancement on the F1 score of grouping accuracy (FGA), and an 18.6% increment on the FTA. Our work pioneered studying the process of log preprocessing and provided a generalizable framework to enhance the state-of- the-art of log parsing.

Thu 6 Mar

Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30
Software Analysis & Recommendation SystemsResearch Papers / Industrial Track / Early Research Achievement (ERA) Track at L-1720
Chair(s): Brittany Reid Nara Institute of Science and Technology
11:00
15m
Talk
A First Look at Package-to-Group Mechanism: An Empirical Study of the Linux Distributions
Research Papers
Dongming Jin Key Lab of High-Confidence of Software Technologies (PKU), Ministry of Education, NIANYU LI ZGC Lab, China, Kai Yang Zhongguancun Laboratory, Minghui Zhou Peking University, Zhi Jin Peking University
11:15
15m
Talk
Preprocessing is All You Need: Boosting the Performance of Log Parsers With a General Preprocessing Framework
Research Papers
Qiaolin Qin Polytechnique Montréal, Roozbeh Aghili Polytechnique Montréal, Heng Li Polytechnique Montréal, Ettore Merlo Polytechnique Montreal
Pre-print
11:30
7m
Talk
Boosting Large Language Models for System Software Retargeting: A Preliminary Study
Early Research Achievement (ERA) Track
Ming Zhong SKLP, Institute of Computing Technology, CAS, Fang Lv Institute of Computing Technology, Chinese Academy of Sciences, Lulin Wang , Lei Qiu SKLP, Institute of Computing Technology, CAS; University of Chinese Academy of Sciences, Hongna Geng SKLP, Institute of Computing Technology, CAS, Huimin Cui Institute of Computing Technology, Chinese Academy of Sciences, Xiaobing Feng ICT CAS
11:37
15m
Talk
Analyzing Logs of Large-Scale Software Systems using Time Curves Visualization
Industrial Track
11:52
15m
Talk
Building Your Own Product Copilot: Challenges, Opportunities, and Needs
Industrial Track
Chris Parnin Georgia Tech, Gustavo Soares Microsoft, Rahul Pandita GitHub, Inc., Sumit Gulwani Microsoft, Jessica Rich , Austin Henley University of Tennessee
12:07
15m
Talk
Filter-based Repair of Semantic Segmentation in Safety-Critical Systems
Industrial Track
Sebastian Schneider , Tomas Sujovolsky , Paolo Arcaini National Institute of Informatics , Fuyuki Ishikawa National Institute of Informatics, Truong Vinh Truong Duy
:
:
:
: