An Exploratory Study on How Non-Determinism in Large Language Models Affects Log Parsing
Most software systems used in production generate system logs that provide a rich source of information about the status and execution behavior of the system. These logs are commonly used to ensure the reliability and maintainability of software systems. The first step toward automated log analysis is generally log parsing, which aims to transform unstructured log messages into structured log templates and extract the corresponding parameters. Recently, Large Language Models (LLMs) such as ChatGPT have shown promising results on a wide range of software engineering tasks, including log parsing. However, the extent to which non-determinism influences log parsing using LLMs remains unclear. In particular, it is important to investigate whether LLMs behave consistently when faced with the same log message multiple times. In this study, we investigate the impact of non-determinism in state-of-the-art LLMs while performing log parsing. Specifically, we select six LLMs, including both paid proprietary and free-to-use models, and evaluate their non-determinism on 16 system logs obtained from a selection of mature open-source projects. The results of our study reveal varying degrees of non-determinism among models. Moreover, they show that there is no guarantee for deterministic results even with a temperature of zero.
Mon 15 AprDisplayed time zone: Lisbon change
11:00 - 12:30 | Late Morning SessionInteNSE at Daciano da Costa Chair(s): Reyhaneh Jabbarvand University of Illinois at Urbana-Champaign, Saeid Tizpaz-Niari University of Texas at El Paso | ||
11:00 70mKeynote | Assured LLM-Based Software Engineering InteNSE Mark Harman Meta Platforms, Inc. and UCL | ||
12:10 20mPaper | An Exploratory Study on How Non-Determinism in Large Language Models Affects Log Parsing InteNSE Merve Astekin Simula Research Laboratory, Max Hort Simula Research Laboratory, Leon Moonen Simula Research Laboratory and BI Norwegian Business School |