Unlocking the Power of Numbers: Log Compression via Numeric Token Parsing
Parser-based log compressors have been widely explored in recent years because the explosive growth of log volumes makes the compression performance of general-purpose compressors unsatisfactory. These parser-based compressors preprocess logs by grouping the logs based on the parsing result and then feed the preprocessed files into a general-purpose compressor. However, parser-based compressors have their limitations. First, the goals of parsing and compression are misaligned, so the inherent characteristics of logs were not fully utilized. In addition, the performance of parser-based compressors depends on the sample logs and thus it is very unstable. Moreover, parser-based compressors often incur a long processing time. To address these limitations, we propose Denum, a simple, general log compressor with high compression ratio and speed. The core insight is that a majority of the tokens in logs are numeric tokens (i.e. pure numbers, tokens with only numbers and special characters, and numeric variables) and effective compression of them is critical for log compression. Specifically, Denum contains a Numeric Token Parsing module, which extracts all numeric tokens and applies tailored processing methods (e.g. store the differences of incremental numbers like timestamps), and a String Processing module, which processes the remaining log content without numbers. The processed files of the two modules are then fed as input to a general-purpose compressor and it outputs the final compression results. Denum has been evaluated on 16 log datasets and it achieves an 8.7% − 434.7% higher average compression ratio and 2.6× − 37.7× faster average compression speed (i.e. 26.2 MB/S) compared to the baselines. Moreover, integrating Denum’s Numeric Token Parsing module into existing log compressors can provide an 11.8% improvement in their average compression ratio and achieve 37% faster average compression speed.
Wed 30 OctDisplayed time zone: Pacific Time (US & Canada) change
10:30 - 12:00 | Log and trace; failure and faultResearch Papers / Industry Showcase at Carr Chair(s): Yiming Tang Rochester Institute of Technology | ||
10:30 15mTalk | Demonstration-Free: Towards More Practical Log Parsing with Large Language Models Research Papers | ||
10:45 15mTalk | Unlocking the Power of Numbers: Log Compression via Numeric Token Parsing Research Papers | ||
11:00 15mTalk | Towards Synthetic Trace Generation of Modeling Operations using In-Context Learning Approach Research Papers Vittoriano Muttillo University of Teramo, Claudio Di Sipio University of l'Aquila, Riccardo Rubei University of L'Aquila, Luca Berardinelli Johannes Kepler University Linz, MohammadHadi Dehghani Johannes Kepler University Linz | ||
11:15 15mTalk | DeployFix: Dynamic Repair of Software Deployment Failures via Constraint Solving Industry Showcase Haoyu Liao East China Normal University, Jianmei Guo East China Normal University, Bo Huang East China Normal University, Yujie Han East China Normal University, Dingyu Yang Zhejiang University, Kai Shi Alibaba Group, Jonathan Ding Intel, Guoyao Xu Alibaba Group, Guodong Yang Alibaba Group, Liping Zhang Alibaba Group | ||
11:30 15mTalk | FAIL: Analyzing Software Failures from the News Using LLMs Research Papers Dharun Anandayuvaraj Purdue University, Matthew Campbell Purdue University, Arav Tewari Purdue University, James C. Davis Purdue University DOI Pre-print | ||
11:45 15mTalk | Do not neglect what's on your hands: localizing software faults with exception trigger stream Research Papers Xihao Zhang School of Computer Science, Wuhan University, Yi Song School of Computer Science, Wuhan University, Xiaoyuan Xie Wuhan University, Qi Xin Wuhan University, Chenliang Xing School of Computer Science, Wuhan University |