LogFold: Compressing Logs with Structured Tokens and Hybrid Encoding
This program is tentative and subject to change.
Logs are essential for diagnosing failures and conducting retrospective studies, leading many software organizations to retain log messages for a long time. Nevertheless, the volume of generated log data grows rapidly as software systems grow, necessitating an effective compression method. Apart from general-purpose compressors (e.g., Gzip, Bzip2), many recent studies developed log-specific compression algorithms, but they offer suboptimal performance because of (1) overlooking redundancies within certain complex tokens, and (2) lacking a fine-grained encoding strategy for diverse token types.
This work uncovers a new redundancy pattern in structured tokens and proposes a new type-aware encoding strategy to improve log compression. Building on this insight, we introduce LogFold, a novel log compression method consisting of four components: a token analyzer to classifies tokens as structured, unstructured, or static types; a processor that mines recurring patterns within structured tokens based on their delimiter skeletons; a hybrid encoder that tailors data representation according to token types; and a packer that compresses the output into an archive file. Extensive experiments on 16 public log datasets demonstrate that LogFold surpasses state-of-the-art baselines, achieving average compression ratio improvements by 11.11%, with a compression speed of 9.842 MB/s. Ablation studies further indicate the importance of each component. We also conduct sensitivity analyses to verify LogFold’s robustness and stability across various internal settings.
This program is tentative and subject to change.
Wed 15 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
14:00 - 15:30 | Architecture and Design 1Research Track / SE In Practice (SEIP) at Oceania VIII Chair(s): Klaus Schmid University of Hildesheim | ||
14:00 15mTalk | Metronome: Differentiated Delay Scheduling for Serverless Functions Research Track Zhuangbin Chen Sun Yat-sen University, Juzheng Zheng School of Software Engineering, Sun Yat-sen University, Zibin Zheng Sun Yat-sen University | ||
14:15 15mTalk | An Enterprise Marketplace for Unified Access to Multi-Cloud and Enterprise Products in a Large Banking Infrastructure SE In Practice (SEIP) Richard CASETTA BNP Paribas, Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, Thomas BRISBOUT BNP Paribas, Jean-François TUR BNP Paribas, Mariam Barry BNP Paribas, Julien VEYBEL BNP Paribas, Jean-Michel GARCIA BNP Paribas, Nils GESBERT Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG, Pierre GENEVES Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG | ||
14:30 15mTalk | CCLInsight: Unveiling Insights in GPU Collective Communication Libraries via Primitive-Centric Analysis Research Track Liuyao Dai University of California, Merced, Adam Weingram University of California, Merced, Weicong Chen University of California, Merced, Xiaoyi Lu UC Merced | ||
14:45 15mTalk | FlowScope: Non-Intrusive Distributed Tracing with Method-Level Delay Estimation for Microservices Troubleshooting Research Track gyt Tsinghua University, Han Zhang Tsinghua University, Zhiheng Wu Tsinghua University, Yahui Li Tsinghua University,China, Jilong Wang Tsinghua university, Xia Yin Tsinghua University | ||
15:00 15mTalk | LogFold: Compressing Logs with Structured Tokens and Hybrid Encoding Research Track Shiwen Shan Sun Yat-sen University, Yintong Huo Singapore Management University, Singapore, Hongzhan Zhong Sun Yat-sen University, Zhining Wang Sun Yat-sen University, Yuxin Su Sun Yat-sen University, Zibin Zheng Sun Yat-sen University | ||
15:15 15mTalk | Relax with Capybaras Research Track Media Attached | ||