InferLog: Accelerating LLM Inference for Online Log Parsing via ICL-oriented Prefix Caching (ICSE 2026 - Research Track)

Who

Yilun Wang, Pengfei Chen, Haiyu Huang, Zilong He, Gou Tan, Chuanfu Zhang, Jingkai He, Zibin Zheng

Track

ICSE 2026 Research Track

Abstract

Modern software systems generate massive volumes of runtime logs, necessitating efficient and accurate log parsing to enable critical downstream tasks such as anomaly detection and root cause analysis. Recently, large language models (LLMs) have achieved advanced accuracy on log parsing, but their deployment in production environments faces two major limitations: 1) the privacy risks associated with commercial LLMs, driving the adoption of locally deployment, and 2) the stringent latency and throughput requirements imposed by high-volume log streams, which existing LLM-based parsers fail to meet. Although recent efforts have reduced the number of LLM queries, they overlook the high latency of the LLM invocations, where concurrent log parsing requests can cause serve performance degradation of LLM inference system.

In this study, we present InferLog, the first LLM inference optimization method for online log parsing. InferLog accelerate inference by designing 1) A prefix-aware ICL refinement policy to refine the examples and permutation of in-context learning to improve the prefix caching efficiency. 2) A rapid and task-specific configuration tuning pipeline based on meta-learning to find the optimal LLM scheduling-relate configuration for dynamic log parsing workloads. The experimental results based on Loghub-2k dataset and vLLM demonstrate that \textit{InferLog} significantly outperforms existing inference optimization methods and markedly accelerates the state-of-the-art LLM-based log parser without compromising parsing accuracy.

Yilun Wang

School of Systems Science and Engineering, Sun Yat-sen University, Guangzhou, China

Pengfei Chen

Sun Yat-sen University

Haiyu Huang