KnowLog: Knowledge Enhanced Pre-trained Language Model for Log Understanding (ICSE 2024 - Research Track)

Who

Lipeng Ma, Weidong Yang, Bo Xu, Sihang Jiang, Ben Fei, Jiaqing Liang, Mingjie Zhou, Yanghua Xiao

Track

ICSE 2024 Research Track

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 17 Apr 2024 11:15 - 11:30 at Maria Helena Vieira da Silva - Language Models and Generated Code 1 Chair(s): Yiling Lou

Abstract

Logs as semi-structured text are rich in semantic information, making their comprehensive understanding crucial for automated log analysis. With the recent success of pre-trained language models in natural language processing, many studies have leveraged these models to understand logs. Despite their successes, existing pre-trained language models still suffer from three weaknesses. Firstly, these models fail to understand domain-specific terminology, especially abbreviations. Secondly, these models struggle to adequately capture the complete log context information. Thirdly, these models have difficulty in obtaining universal representations of different styles of the same logs. To address these challenges, we introduce KnowLog, a knowledge-enhanced pre-trained language model for log understanding. Specifically, to solve the previous two challenges, we exploit abbreviations and natural language descriptions of logs from public documentation as local and global knowledge, respectively, and leverage this knowledge by designing novel pre-training tasks for enhancing the model. To solve the last challenge, we design a contrastive learning-based pre-training task to obtain universal representations. We evaluate KnowLog by fine-tuning it on six different log understanding tasks. Extensive experiments demonstrate that KnowLog significantly enhances log understanding and achieves state-of-the-art results compared to existing pre-trained language models without knowledge enhancement. Moreover, we conduct additional experiments in transfer learning and low-resource scenarios, showcasing the substantial advantages of KnowLog. Our source code and detailed experimental data are available at https://github.com/LeaperOvO/KnowLog.

Lipeng Ma

Fudan University

Weidong Yang

Fudan University

Bo Xu

Donghua University

Sihang Jiang

Fudan University

Ben Fei

Fudan University

Jiaqing Liang

Fudan University

Mingjie Zhou

Fudan University

Yanghua Xiao

Fudan University

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 17 Apr
Displayed time zone: Lisbon change

11:00 - 12:30	Language Models and Generated Code 1Research Track / New Ideas and Emerging Results at Maria Helena Vieira da Silva Chair(s): Yiling Lou Fudan University

11:00 15m Talk		Modularizing while Training: a New Paradigm for Modularizing DNN Models Research Track Binhang Qi Beihang University, Hailong Sun Beihang University, Hongyu Zhang Chongqing University, Ruobing Zhao Beihang University, Xiang Gao Beihang University Pre-print
11:15 15m Research paper		KnowLog: Knowledge Enhanced Pre-trained Language Model for Log Understanding Research Track Lipeng Ma Fudan University, Weidong Yang Fudan University, Bo Xu Donghua University, Sihang Jiang Fudan University, Ben Fei Fudan University, Jiaqing Liang Fudan University, Mingjie Zhou Fudan University, Yanghua Xiao Fudan University
11:30 15m Talk		FAIR: Flow Type-Aware Pre-Training of Compiler Intermediate Representations Research Track Changan Niu Software Institute, Nanjing University, Chuanyi Li Nanjing University, Vincent Ng Human Language Technology Research Institute, University of Texas at Dallas, Richardson, TX 75083-0688, David Lo Singapore Management University, Bin Luo Nanjing University Pre-print
11:45 15m Talk		Unveiling Memorization in Code Models Research Track Zhou Yang Singapore Management University, Zhipeng Zhao Singapore Management University, Chenyu Wang Singapore Management University, Jieke Shi Singapore Management University, Dongsun Kim Kyungpook National University, DongGyun Han Royal Holloway, University of London, David Lo Singapore Management University
12:00 15m Talk		Code Search is All You Need? Improving Code Suggestions with Code Search Research Track Junkai Chen Zhejiang University, Xing Hu Zhejiang University, Zhenhao Li Concordia University, Cuiyun Gao Harbin Institute of Technology, Xin Xia Huawei Technologies, David Lo Singapore Management University
12:15 7m Talk		Expert Monitoring: Human-Centered Concept Drift Detection in Machine Learning Operations New Ideas and Emerging Results Joran Leest Vrije Universiteit Amsterdam, Claudia Raibulet Vrije Universiteit Amsterdam, Ilias Gerostathopoulos Vrije Universiteit Amsterdam, Patricia Lago Vrije Universiteit Amsterdam Pre-print