Bridging the Gap Between Log Parsing Techniques and Practitioners: Challenges and Solutions
Logs usually contain rich information about the run-time behaviors of a software system. Various log-based software analysis techniques have been proposed in prior research. Log parsing is the very first step for log-based software analysis techniques, which transforms the logs from unstructured text to categorical data with a structured format. As log files are usually large in size, a lot of automated log parsing techniques are proposed. However, applying log parsing techniques in practice still faces a lot of challenges. I divide these challenges into two categories: 1) evaluation related challenge and 2) practical application related challenge. The former challenges make practitioners hard to choose a proper log parsing technique and the following challenges make it hard for practitioners to apply log parsing techniques in practice. I propose one evaluation related challenge in this paper: Datasets used for evaluation benchmarks on log parsing techniques are limited. To solve the challenge, I propose a semi-automatic approach to generate oracle templates for extra large log datasets and the oracle templates can be used to generate groundtruth for log parsing benchmark. I also propose three practical application related challenges: 1) Insufficient knowledge to configure parsing tools, 2) incompatible with parsing non-English logs, and 3) the semantic knowledge of the dynamic information is usually not encapsulated. I propose a parameter-insensitive log parsing technique that utilizes entropy to identify dynamic variables and static text to solve the first challenge. To solve the second challenge, I evaluate the factors that can affect the performance of log parsing results on non-English logs and propose a framework for parsing non-English logs. For the third challenge, I utilize the semantic knowledge of the dynamic information to further enrich the output structure of log parsing techniques for downstream tasks. I expect my study can not only help practitioners apply log parsing techniques in practice but also bring log parsing techniques to more downstream tasks.