Unleashing the True Potential of Semantic-based Log Parsing with Pre-trained Language Models (ICSE 2025 - Research Track)

Who

Van-Hoang Le, Yi Xiao, Hongyu Zhang

Track

ICSE 2025 Research Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 30 Apr 2025 17:15 - 17:30 at 213 - AI for Program Comprehension 1 Chair(s): Yintong Huo

Abstract

Software-intensive systems often produce console logs for troubleshooting purpose. Log parsing, which aims at parsing a log message into a specific log template, typically serves as the first step toward automated log analytics. To better comprehend semantic information of log messages, many semantic-based log parsers have been proposed. These log parsers fine-tune a small pretrained language model (PLM) such as RoBERTa on a few labelled log samples. With the increasing popularity of large language models (LLMs), some recent studies also propose to leverage LLMs such as ChatGPT through in-context learning for automated log parsing, and obtain better results than previous semantic-based log parsers with small PLMs. In this paper, we show that semantic-based log parsers with small PLMs can actually achieve better or comparable performance to state-of-the-art LLM-based log parsing models while being more efficient and cost-effective. We propose UNLEASH, a novel semantic-based log parsing approach, which incorporates three enhancement methods to boost the performance of PLMs for log parsing, including (1) an entropy-based ranking method to select the most informative log samples; (2) a contrastive learning method to enhance the fine-tuning process; and (3) an inference optimization method to improve the log parsing performance. We evaluate UNLEASH on a set of large log datasets and the experimental results show that UNLEASH is effective and efficient, when compared to state-of-the-art log parsers.

Van-Hoang Le

The University of Newcastle

Australia

Yi Xiao

Hongyu Zhang

Chongqing University

China