Bridging the Gap Between Log Parsing Techniques and Practitioners: Challenges and Solutions
Logs usually contain rich information about the run-time behaviors of a software system. Various log-based software analysis techniques have been proposed in prior research. Log parsing is the very first step for log-based software analysis techniques, which transforms the logs from unstructured text to categorical data with a structured format. As log files are usually large in size, a lot of automated log parsing techniques are proposed. However, applying log parsing techniques in practice still faces a lot of challenges. I divide these challenges into two categories: 1) evaluation related challenge and 2) practical application related challenge. The former challenges make practitioners hard to choose a proper log parsing technique and the following challenges make it hard for practitioners to apply log parsing techniques in practice. I propose one evaluation related challenge in this paper: Datasets used for evaluation benchmarks on log parsing techniques are limited. To solve the challenge, I propose a semi-automatic approach to generate oracle templates for extra large log datasets and the oracle templates can be used to generate groundtruth for log parsing benchmark. I also propose three practical application related challenges: 1) Insufficient knowledge to configure parsing tools, 2) incompatible with parsing non-English logs, and 3) the semantic knowledge of the dynamic information is usually not encapsulated. I propose a parameter-insensitive log parsing technique that utilizes entropy to identify dynamic variables and static text to solve the first challenge. To solve the second challenge, I evaluate the factors that can affect the performance of log parsing results on non-English logs and propose a framework for parsing non-English logs. For the third challenge, I utilize the semantic knowledge of the dynamic information to further enrich the output structure of log parsing techniques for downstream tasks. I expect my study can not only help practitioners apply log parsing techniques in practice but also bring log parsing techniques to more downstream tasks.
Tue 29 AprDisplayed time zone: Eastern Time (US & Canada) change
14:00 - 15:00 | Session 3: Maintenance (talks and panel)Doctoral Symposium at 212 Chair(s): Alexander Serebrenik Eindhoven University of Technology | ||
14:00 6mTalk | Concern-based Management of Software Design Complexity Doctoral Symposium Jason Lefever Drexel University | ||
14:06 6mTalk | Mitigating Waste That Tacitly Accrues in Continuous Integration Pipelines Doctoral Symposium Nimmi Rashinika Weeraddana University of Waterloo Pre-print | ||
14:12 6mTalk | Automated Detection and Refactoring of Mock Clones in Java Projects Doctoral Symposium Gengwu Zhao Stevens Institute of Technology | ||
14:18 6mTalk | Practical Preprocessing of Logs at Scale Doctoral Symposium JianChen Zhao University of Waterloo | ||
14:24 6mTalk | Bridging the Gap Between Log Parsing Techniques and Practitioners: Challenges and Solutions Doctoral Symposium Hetong Dai University of Waterloo | ||
14:30 30mPanel | Panel: Maintenance Doctoral Symposium Sridhar Chimalakonda Indian Institute of Technology Tirupati, Wesley Assunção Johannes Kepler University Linz, Hetong Dai University of Waterloo, Jason Lefever Drexel University, Nimmi Weeraddana University of Waterloo, JianChen Zhao University of Waterloo, Gengwu Zhao Stevens Institute of Technology |