Practical Preprocessing of Logs at Scale (ICSE 2025 - Doctoral Symposium)

Track

ICSE 2025 Doctoral Symposium

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 29 Apr 2025 14:18 - 14:24 at 212 - Session 3: Maintenance (talks and panel) Chair(s): Alexander Serebrenik

Abstract

Logs are diverse in structure and large in volume. While containing important information about systems at runtime, they must be preprocessed before analysis can be performed. First, logs need to be parsed into a useful format and second, often times, logs need to be separated into groups before to reduce noise. We identify two challenges in adopting preprocessing of logs in large scale: the preprocessing steps must be generalizable to handle the diversity and evolving nature of logs, and efficient to keep up with the large volume produced by applications. To tackle these challenges, we first focus our research on studying the use of identifiers used for log groupings. We then propose an alternative approach based on interleaved sequence models. We also investigate log parsing on console logs, a type of logs of which parsing is not well studied. Finally, we propose a log parsing technique based on entropy estimated with a language model.

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 29 Apr
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:00	Session 3: Maintenance (talks and panel)Doctoral Symposium at 212 Chair(s): Alexander Serebrenik Eindhoven University of Technology

14:00 6m Talk		Concern-based Management of Software Design Complexity Doctoral Symposium Jason Lefever Drexel University
14:06 6m Talk		Mitigating Waste That Tacitly Accrues in Continuous Integration Pipelines Doctoral Symposium Nimmi Rashinika Weeraddana University of Waterloo Pre-print
14:12 6m Talk		Automated Detection and Refactoring of Mock Clones in Java Projects Doctoral Symposium Gengwu Zhao Stevens Institute of Technology
14:18 6m Talk		Practical Preprocessing of Logs at Scale Doctoral Symposium JianChen Zhao University of Waterloo
14:24 6m Talk		Bridging the Gap Between Log Parsing Techniques and Practitioners: Challenges and Solutions Doctoral Symposium Hetong Dai University of Waterloo
14:30 30m Panel		Panel: Maintenance Doctoral Symposium Sridhar Chimalakonda Indian Institute of Technology Tirupati, Wesley Assunção Johannes Kepler University Linz, Hetong Dai University of Waterloo, Jason Lefever Drexel University, Nimmi Weeraddana University of Waterloo, JianChen Zhao University of Waterloo, Gengwu Zhao Stevens Institute of Technology

Practical Preprocessing of Logs at Scale

Tue 29 Apr
Displayed time zone: Eastern Time (US & Canada) change

JianChen Zhao

University of Waterloo

Tracks

Co-hosted Conferences

Workshops

Practical Preprocessing of Logs at Scale

Program Display Configuration

Program Display Configuration

Tue 29 AprDisplayed time zone: Eastern Time (US & Canada) change

JianChen Zhao

University of Waterloo

Tue 29 Apr
Displayed time zone: Eastern Time (US & Canada) change