Logs generated by large-scale software systems contain a huge amount of useful information. As the first step of automated log analysis, log parsing has been extensively studied. General log parsing techniques focus on identifying static templates from raw logs, but overlook the more important semantics implied in dynamic log parameters. With the popularity of Artificial Intelligence for IT Operations (AIOps), traditional log parsing methods no longer meet the requirements of various downstream tasks. Researchers are now exploring the next generation of log parsing techniques, i.e., semantic log parsing, to identify both log templates and semantics in log parameters. However, the absence of semantic annotations in existing datasets hinders the training and evaluation of semantic log parsers, thereby stalling the progress of semantic log parsing.
To fill this gap and advance the field of semantic log parsing, we construct LogBase, the first semantic log parsing benchmark dataset. LogBase consists of logs from 130 popular open-source projects, containing 85,300 semantically annotated log templates, surpassing existing datasets in both log source diversity and template richness. To build Logbase, we develop the framework GenLog for constructing semantic log parsing datasets. GenLog mines log template-parameter-context triplets from popular open-source repositories on GitHub, and uses chain-of-thought (CoT) techniques with large language models (LLMs) to generate high-quality logs. Meanwhile, GenLog employs human feedback to improve the quality of the generated data and ensure its reliability. GenLog is highly automated and cost-effective, enabling researchers to easily and efficiently construct semantic log parsing datasets. Furthermore, we also design a set of comprehensive evaluation metrics for LogBase, including general log parser metrics and the metrics specifically for semantic log parsers and LLM-based parsers.
With LogBase, we extensively evaluate 15 existing log parsers, revealing their true performance in complex scenarios. We believe that this work provides researchers with valuable data, reliable tools, and insightful findings to support and guide the future research of semantic log parsing.
Wed 25 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
| 14:00 - 15:15 | Runtime Analysis, Verification, and SlicingResearch Papers at Aurora C  Chair(s): Heqing Huang City University of Hong Kong | ||
| 14:0025m Talk | Adding Spatial Memory Safety to EDK II through Checked C (Experience Paper) Research Papers Sourag Cherupattamoolayil Purdue University, Arunkumar Bhattar Purdue University, Connor Everett Glosner Purdue University, Aravind Machiry Purdue UniversityDOI | ||
| 14:2525m Talk | LogBase: A Large-Scale Benchmark for Semantic Log Parsing Research Papers Chenbo Zhang Fudan University, Wenying Xu Fudan University, Jinbu Liu Alibaba, Lu Zhang Fudan University, Guiyang Liu Alibaba, Jihong Guan Tongji University, Qi Zhou Alibaba, Shuigeng Zhou Fudan UniversityDOI | ||
| 14:5025m Talk | Static Program Reduction via Type-Directed Slicing Research Papers Loi Ngo Duc Nguyen University of California, Riverside, Tahiatul Islam New Jersey Institute of Technology, Theron Wang The Academy for Mathematics, Science & Engineering, USA, Sam Lenz New Jersey Institute of Technology, Martin Kellogg New Jersey Institute of TechnologyDOI Pre-print | ||
Aurora C is the third room in the Aurora wing.
When facing the main Cosmos Hall, access to the Aurora wing is on the right, close to the side entrance of the hotel.