Towards Universal Segmentation for Log Parsing
Log parsing is a crucial step in log analysis, as it transforms unstructured log messages into structured data required by various downstream analysis tasks. The sheer volume of log data generated by modern software systems motivates the development of numerous log parsing techniques in the literature. However, existing log parsers still suffer from unsatisfactory accuracy, which may significantly affect the follow-up analysis such as log-based anomaly detection. We have identified two main limitations that hinder the effectiveness of existing log parsing methods: (1) under-segmentation: most log parsers leverage a fixed, predefined set of delimiters to separate a log message into a set of tokens, which may fail to split log messages correctly due to the heterogeneity of logging formats; (2) over-segmentation: using too many delimiters may lead to the over-segmentation issue, which fragments meaningful units in log messages and makes it difficult to accurately identify templates and parameters. To address these limitations, we propose SCLog, a novel syntax- and contextual-aware segmentation approach for log parsing. SCLog leverages a comprehensive set of syntax-based heuristics to segment log messages into coarse-grained tokens. To further tokenize log messages into fine-grained tokens, SCLog mines the structural patterns of tokens based on their surrounding contexts to identify the optimal delimiters for each token dynamically. We evaluate SCLog on widely-used, large-scale Loghub-2.0 datasets. The results demonstrate that SCLog significantly outperforms state-of-the-art log parsers in terms of parsing accuracy and robustness across diverse datasets.
Mon 13 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
11:00 - 12:30 | Session 5 - Summarization, Documentation, and Code ReviewResearch Track / Vaclav Rajlich Early Career Award / ICPC Program / Journal First at Europa II Chair(s): Masud Rahman Dalhousie University | ||
11:00 10mTalk | Vaclav Rajlich Award Vaclav Rajlich Early Career Award Marvin Wyrich Saarland University | ||
11:10 10mTalk | RepoMind: Enhancing Repository-Level Code Generation via LLM Reasoning over Structured Repository Documentation Research Track Songwen Gong South China University of Technology, Mengzhen Wang South China University of Technology, Jiexin Wang South China University of Technology, Yi Cai School of Software Engineering, South China University of Technology, Guangzhou, China | ||
11:20 10mTalk | SQL-Commenter: Aligning Large Language Models for SQL Comment Generation with Direct Preference Optimization Research Track Lei Yu Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences, China, Peng Wang Institute of Information Engineering,Chinese Academy of Sciences, Jingyuan Zhang Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences, China, Xin Wang Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Jia Xu Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Li Yang Institute of Software, Chinese Academy of Sciences, Changzhi Deng Institute of Software, Chinese Academy of Sciences, Jiajia Ma Institute of Software, Chinese Academy of Sciences, China, Fengjun Zhang Institute of Software, Chinese Academy of Sciences, China Pre-print Media Attached File Attached | ||
11:30 10mTalk | Studying Quality Improvements Recommended via Manual and Automated Code Review Research Track Giuseppe Crupi Università della Svizzera italiana, Rosalia Tufano Università della Svizzera Italiana, Gabriele Bavota Software Institute @ Università della Svizzera Italiana Pre-print | ||
11:40 10mTalk | Towards Universal Segmentation for Log Parsing Research Track Van-Hoang Le University of Luxembourg, Luxembourg, Domenico Bianculli University of Luxembourg, Huy-Trung Nguyen Posts and Telecommunications Institute of Technology Pre-print | ||
11:50 10mTalk | DPS: Design Pattern Summarisation Using Code Features Journal First Najam Nazar Monash University, Sameer Sikka University of Melbourne, Christoph Treude Singapore Management University | ||
12:00 10mTalk | On the Impact of Code Comments for Automated Bug-Fixing: An Empirical Study Research Track Antonio Vitale Politecnico di Torino, University of Molise, Emanuela Guglielmi University of Molise, Simone Scalabrino University of Molise, Rocco Oliveto University of Molise Pre-print | ||
12:10 20mLive Q&A | Joint QA and Discussion ICPC Program | ||