TCSE logo 
 Sigsoft logo
Sustainability badge
Tue 29 Apr 2025 14:24 - 14:30 at 212 - Session 3: Maintenance (talks and panel) Chair(s): Alexander Serebrenik

Logs usually contain rich information about the run-time behaviors of a software system. Various log-based software analysis techniques have been proposed in prior research. Log parsing is the very first step for log-based software analysis techniques, which transforms the logs from unstructured text to categorical data with a structured format. As log files are usually large in size, a lot of automated log parsing techniques are proposed. However, applying log parsing techniques in practice still faces a lot of challenges. I divide these challenges into two categories: 1) evaluation related challenge and 2) practical application related challenge. The former challenges make practitioners hard to choose a proper log parsing technique and the following challenges make it hard for practitioners to apply log parsing techniques in practice. I propose one evaluation related challenge in this paper: Datasets used for evaluation benchmarks on log parsing techniques are limited. To solve the challenge, I propose a semi-automatic approach to generate oracle templates for extra large log datasets and the oracle templates can be used to generate groundtruth for log parsing benchmark. I also propose three practical application related challenges: 1) Insufficient knowledge to configure parsing tools, 2) incompatible with parsing non-English logs, and 3) the semantic knowledge of the dynamic information is usually not encapsulated. I propose a parameter-insensitive log parsing technique that utilizes entropy to identify dynamic variables and static text to solve the first challenge. To solve the second challenge, I evaluate the factors that can affect the performance of log parsing results on non-English logs and propose a framework for parsing non-English logs. For the third challenge, I utilize the semantic knowledge of the dynamic information to further enrich the output structure of log parsing techniques for downstream tasks. I expect my study can not only help practitioners apply log parsing techniques in practice but also bring log parsing techniques to more downstream tasks.

Tue 29 Apr

Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:00
Session 3: Maintenance (talks and panel)Doctoral Symposium at 212
Chair(s): Alexander Serebrenik Eindhoven University of Technology
14:00
6m
Talk
Concern-based Management of Software Design Complexity
Doctoral Symposium
Jason Lefever Drexel University
14:06
6m
Talk
Mitigating Waste That Tacitly Accrues in Continuous Integration Pipelines
Doctoral Symposium
Nimmi Rashinika Weeraddana University of Waterloo
Pre-print
14:12
6m
Talk
Automated Detection and Refactoring of Mock Clones in Java Projects
Doctoral Symposium
Gengwu Zhao Stevens Institute of Technology
14:18
6m
Talk
Practical Preprocessing of Logs at Scale
Doctoral Symposium
JianChen Zhao University of Waterloo
14:24
6m
Talk
Bridging the Gap Between Log Parsing Techniques and Practitioners: Challenges and Solutions
Doctoral Symposium
Hetong Dai University of Waterloo
14:30
30m
Panel
Panel: Maintenance
Doctoral Symposium
Sridhar Chimalakonda Indian Institute of Technology Tirupati, Wesley Assunção Johannes Kepler University Linz, Hetong Dai University of Waterloo, Jason Lefever Drexel University, Nimmi Weeraddana University of Waterloo, JianChen Zhao University of Waterloo, Gengwu Zhao Stevens Institute of Technology
:
:
:
: