InsightAI: Root Cause Analysis in Large Hierarchical Log Files with Private Data Using Large Language Models (CAIN 2025 - Research and Experience Papers)

Who

Maryam Ekhlasi, Anurag Prakash, Michel Dagenais, Maxime Lamothe

Track

CAIN 2025 Research and Experience Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 27 Apr 2025 11:40 - 11:55 at 208 - Engineering AI systems with LLMs Chair(s): Justus Bogner

Abstract

Abstract—[Problem] As industries increasingly depend on complex software systems, efficient log analysis is essential for maintaining reliability and privacy. However, Identifying problems through logs is often time-consuming and costly for developers. [Background] Large language models (LLMs) can automate parts of log analysis, but challenges like limited computational resources and the frequent need to retrain LLMs due to the dynamic nature of software logs persist. External LLMs, such as GPTs, along with in-context learning techniques, can help reduce some of these issues, but other challenges, including token limitations, high token costs, and data privacy, remain. [Method] To tackle these challenges, we developed an automated pipeline that extracts log files and employs in-context learning, allowing the model to efficiently adapt to changes without extensive retraining. Our approach introduces a novel flame-graph-like method that reduces token usage, thereby lowering token-related costs and response latency while maintaining high accuracy. [Results] This solution allows industries to automate log analysis, minimize system downtime, and enhance performance, all while keeping data privacy and maintaining operational efficiency. [Conclusion] Our flame-graph-like methodology reduces input tokens by 93.61% and processing latency by 77.45%. Our anonymization results show an improvement of 138.63% over the baseline. This industrial experience report presents our approach to allow industries to balance token costs, maintain response accuracy, and ensure data privacy while relying on external LLMs without the need to manage computational resources directly.

Maryam Ekhlasi

Polytechnique Montreal

Anurag Prakash

Ciena

Michel Dagenais

Polytechnique Montréal

Maxime Lamothe