Multi-tier distributed systems are systems composed of several distributed nodes organized in layered tiers. Each tier implements a set of conceptually homogeneous functionalities that provide services to the tier above and use services of the tier below, in the layered structure. The distributed computing infrastructure and the connection among the vertical and horizontal structures make multi-tier distributed systems extremely complex and difficult to understand even for their developers. Indeed, runtime failures are becoming the norm rather than the exception in many multi-tier distributed systems [2–4]. Predicting failures at runtime is essential to trigger automatic and operator-driven reactions to either avoid the incoming failures or mitigate their impact on the overall system reliability. Current approaches for predicting failures exploit either anomaly-based or signature-based strategies. Anomaly-based strategies consider behaviors that significantly deviate from the normal system behavior as symptoms of failures that may occur in the near future. Signature-based strategies rely on known patterns of failure-prone behaviors, called signatures, to predict failures that match the pattern. Anomaly-based techniques suffer from false positives, while signature-based techniques cannot cope with emerging failures. In our paper , we present PreMiSE (PREdicting failures in Multi-tIer distributed SystEms), a novel approach to accurately predict failures and precisely locate the responsible faults in multi tier distributed systems. PreMiSE combines signature-based with anomaly-based approaches, to reduce the false positive rate of anomaly-based approaches, and improve the accuracy of signature-based approaches. As illustrated in Figure 1, PreMiSE (i) monitors the status of the system by collecting (a large set of) performance indicators that we refer to as Key Performance Indicators (KPIs) (KPI monitoring), (ii) identifies deviations from normal behaviors by pinpointing anomalous KPIs with anomaly-based techniques (Anomaly detection), (iii) identifies incoming failures by identifying symptomatic anomalous KPI sets with signature-based techniques (Signature-based failure prediction). We evaluated PreMiSE on a prototype multi-tier distributed architecture that implements telecommunication services. The experimental data indicate that PreMiSE can predict failures and locate faults with high precision and low false positive rates for some relevant classes of faults, thus confirming our research hypotheses.
Thu 24 Sep Times are displayed in time zone: (UTC) Coordinated Universal Time change
|09:10 - 09:30|
|09:30 - 09:50|
David BerendNanyang Technological University, Singapore, Xiaofei XieNanyang Technological University, Lei MaKyushu University, Lingjun ZhouCollege of Intelligence and Computing, Tianjin University, Yang LiuNanyang Technological University, Singapore, Chi XuSingapore Institute of Manufacturing Technology, A*Star, Jianjun ZhaoKyushu University
|09:50 - 10:10|