Toward Adaptive Tracing: Efficient System Behavior Analysis using Language Models
Tracing, a technique essential for unraveling the complexities of computer systems’ behavior, involves the organized collection of low-level events, enabling anomaly identification, performance debugging, and root-cause analysis. However, the significant overhead it imposes on large-scale systems, particularly in terms of performance and storage, has made it a less favorable tool for system maintenance. Previous efforts to mitigate tracing’s burden have mostly centered around automating trace analysis but have primarily neglected the duration of events, a significant aspect of the information provided by tracers. To address these challenges, we propose an Adaptive Tracing method that leverages Language Models and kernel trace for precise system modeling. This novel approach minimizes overhead by recording detailed traces only during significant behavioral shifts and focusing on subsystems related to the root-cause. Using a multi-task model, incorporating system call sequences and durations, we propose a root-cause analysis method, enhancing model transparency and enabling targeted system tracing. Evaluation using a dataset of normal and noisy traces from an Apache server reveals that our Adaptive Tracer method captures events related to abrupt changes with only 5.8% loss, reducing the collected trace by 77.1%, and accurately determining the respective noise set with 92.7% accuracy, outperforming previous state-of-the-art trace models by 20.4%.