LibreLog: Accurate and Efficient Unsupervised Log Parsing Using Open-Source Large Language Models
This program is tentative and subject to change.
Log parsing is a critical step that transforms unstructured log data into structured formats, facilitating subsequent log-based analysis. Traditional syntax-based log parsers are efficient and effective, but they often experience decreased accuracy when processing logs that deviate from the predefined rules. Recently, large language models (LLM) based log parsers have shown superior parsing accuracy. However, existing LLM-based parsers face three main challenges: 1) time-consuming and labor-intensive manual labeling for fine-tuning or in-context learning, 2) increased parsing costs due to the vast volume of log data and limited context size of LLMs, and 3) privacy risks from using commercial models like ChatGPT with sensitive log information. To overcome these limitations, this paper introduces LibreLog, an unsupervised log parsing approach that leverages open-source LLMs (i.e., Llama3-8B) to enhance privacy and reduce operational costs while achieving state-of-the-art parsing accuracy. LibreLog first groups logs with similar static text but varying dynamic variables using a fixed-depth grouping tree. It then parses logs within these groups using three components: i) similarity scoring-based retrieval augmented generation: selects diverse logs within each group based on Jaccard similarity, helping the LLM distinguish between static text and dynamic variables; ii) self-reflection: iteratively query LLMs to refine log templates to improve parsing accuracy; and iii) log template memory: stores parsed templates to reduce LLM queries for improved parsing efficiency. Our evaluation on LogHub-2.0 shows that LibreLog achieves 25% higher parsing accuracy and processes logs 2.7 times faster compared to state-of-the-art LLM-based parsers. In short, LibreLog addresses privacy and cost concerns of using commercial LLMs while achieving state-of- the-arts parsing efficiency and accuracy.
This program is tentative and subject to change.
Wed 30 AprDisplayed time zone: Eastern Time (US & Canada) change
16:00 - 17:30 | AI for Program Comprehension 1Research Track at 213 Chair(s): Yintong Huo Singapore Management University, Singapore | ||
16:00 15mTalk | ADAMAS: Adaptive Domain-Aware Performance Anomaly Detection in Cloud Service Systems Research Track Wenwei Gu The Chinese University of Hong Kong, Jiazhen Gu Chinese University of Hong Kong, Jinyang Liu Chinese University of Hong Kong, Zhuangbin Chen Sun Yat-sen University, Jianping Zhang The Chinese University of Hong Kong, Jinxi Kuang The Chinese University of Hong Kong, Cong Feng Huawei Cloud Computing Technology, Yongqiang Yang Huawei Cloud Computing Technology, Michael Lyu The Chinese University of Hong Kong | ||
16:15 15mTalk | LibreLog: Accurate and Efficient Unsupervised Log Parsing Using Open-Source Large Language Models Research Track Zeyang Ma Concordia University, Dong Jae Kim DePaul University, Tse-Hsun (Peter) Chen Concordia University | ||
16:30 15mTalk | Model Editing for LLMs4Code: How Far are We? Research Track Xiaopeng Li National University of Defense Technology, Shangwen Wang National University of Defense Technology, Shasha Li National University of Defense Technology, Jun Ma National University of Defense Technology, Jie Yu National University of Defense Technology, Xiaodong Liu National University of Defense Technology, Jing Wang National University of Defense Technology, Bin Ji National University of Defense Technology, Weimin Zhang National University of Defense Technology Pre-print | ||
16:45 15mTalk | Software Model Evolution with Large Language Models: Experiments on Simulated, Public, and Industrial Datasets Research Track Christof Tinnes Saarland University, Alisa Carla Welter Saarland University, Sven Apel Saarland University Pre-print | ||
17:00 15mTalk | SpecRover: Code Intent Extraction via LLMs Research Track Haifeng Ruan National University of Singapore, Yuntong Zhang National University of Singapore, Abhik Roychoudhury National University of Singapore | ||
17:15 15mTalk | Unleashing the True Potential of Semantic-based Log Parsing with Pre-trained Language Models Research Track |