LogShrink: Effective Log Compression by Leveraging Commonality and Variability of Log Data
Log data is a crucial resource for recording system events and states during system execution. However, as systems grow in scale, log data generation has become increasingly explosive, leading to an expensive overhead on log storage, such as several petabytes per day in production. To address this issue, log compression has become a crucial task in reducing disk storage while allowing for further log analysis. Unfortunately, existing general-purpose and log-specific compression methods have been limited in their ability to utilize log data characteristics. To overcome these limitations, we conduct an empirical study and identify three major observations on the characteristics of log data that can facilitate the log compression task. Based on these observations, we propose LogShrink, a novel and effective log compression method by leveraging commonality and variability of log data. An analyzer based on Longest Common Subsequence and entropy techniques is proposed to identify the latent commonality and variability in log messages. The key idea behind this is that the commonality and variability can be exploited to shrink log data with a shorter representation. Besides, a clustering-based sequence sampler is introduced to accelerate the commonality and variability analyzer. The extensive experimental results demonstrate that LogShrink can exceed baselines in compression ratio by 16% to 356% on average while preserving a reasonable compression speed.
Wed 17 AprDisplayed time zone: Lisbon change
| 16:00 - 17:30 | Analytics 2Research Track / Journal-first Papers / Demonstrations at Sophia de Mello Breyner Andresen Chair(s): Grace Lewis Carnegie Mellon Software Engineering Institute | ||
| 16:0015m Talk | LogShrink: Effective Log Compression by Leveraging Commonality and Variability of Log Data Research Track Xiaoyun Li Sun Yat-sen University, Hongyu Zhang Chongqing University, Van-Hoang Le The University of Newcastle, Pengfei Chen Sun Yat-sen UniversityPre-print | ||
| 16:1515m Talk | Demystifying Compiler Unstable Feature Usage and Impacts in the Rust Ecosystem Research Track Chenghao Li Zhejiang University, Yifei Wu Zhejiang University, Wenbo Shen Zhejiang University, China, Zichen Zhao Zhejiang University, Rui Chang Zhejiang University, Chengwei Liu Nanyang Technological University, Yang Liu Nanyang Technological University, Kui Ren Zhejiang UniversityDOI Pre-print Media Attached | ||
| 16:3015m Talk | Resource Usage and Optimization Opportunities in Workflows of GitHub Actions Research TrackPre-print | ||
| 16:4515m Talk | Revealing Hidden Threats: An Empirical Study of Library Misuse in Smart Contracts Research Track Mingyuan Huang Sun Yat-Sen University, Jiachi Chen Sun Yat-sen University, Zigui Jiang Sun Yat-sen University, Zibin Zheng Sun Yat-sen University | ||
| 17:007m Talk | A Grounded Theory of Cross-community SECOs: Feedback Diversity vs. Synchronization Journal-first Papers Armstrong Foundjem Queens University, Ellis E. Eghan University of Cape Coast, Ghana, Bram Adams Queen's University | ||
| 17:077m Talk | Studying the Characteristics of AIOps Projects on GitHub Journal-first Papers Roozbeh Aghili Polytechnique Montréal, Heng Li Polytechnique Montréal, Foutse Khomh École Polytechnique de Montréal | ||
| 17:147m Talk | A First Look at Dark Mode in Real-World Android App Journal-first Papers Suyu Ma Monash University, Chunyang Chen Technical University of Munich (TUM), Hourieh Khalajzadeh Deakin University, Australia, John Grundy Monash UniversityLink to publication DOI Pre-print | ||
| 17:217m Talk | GitBug-Actions: Building Reproducible Bug-Fix Benchmarks with GitHub Actions Demonstrations Nuno Saavedra INESC-ID and IST, University of Lisbon, André Silva KTH Royal Institute of Technology, Martin Monperrus KTH Royal Institute of Technology | ||



