When Parsing Goes Wrong: An Empirical Study of Error Propagation and Data Augmentation in Log Anomaly Detection
Log analysis plays a critical role in ensuring the reliability and security of modern software systems, with log parsing and anomaly detection forming the foundation of many operational monitoring pipelines. However, existing research has largely treated these two components in isolation, leaving the impact of parsing errors on downstream anomaly detection underexplored. In addition, the scarcity of labeled industrial logs has led to increasing reliance on public datasets and cross-dataset training, whose effectiveness in real-world settings remains unclear. In this paper, we present a comprehensive empirical study that bridges log parsing and anomaly detection through a unified evaluation framework. We identify a structured taxonomy of six common log parsing error types by analyzing the outputs of multiple rule-based and LLM-based parsers across 16 public datasets and four industrial datasets. Through controlled correction experiments, we demonstrate that parsing errors propagate systematically to anomaly detection, and that improving parsing quality yields consistent performance gains, particularly in data-scarce industrial environments. We further show that while carefully curated external logs can enhance detection performance, the benefits of data augmentation are highly model dependent and may even lead to negative transfer when domain mismatches occur. Our findings highlight the necessity of treating log parsing, data curation, and anomaly detection as a tightly coupled system design problem, and provide practical guidance for building robust log analysis pipelines in real-world deployments.