ASE 2021
Wed 17 Nov 2021 22:20 - 22:40 at Koala - Performance

Service reliability is one of the key challenges that cloud providers have to deal with. In cloud systems, unplanned service failures may cause severe cascading impacts on their dependent services, deteriorating customer satisfaction. Predicting the cascading impacts accurately and efficiently is critical to the operation and maintenance of cloud systems. Existing approaches identify whether one service depends on another via distributed tracing but no prior work focused on discriminating the intensity of the dependency between cloud services. In this paper, we empirically study the outages and the procedure for failure diagnosis in two cloud providers to motivate the definition of the intensity of dependency. Then we propose AIM, the first approach to predict the intensity of dependencies between cloud microservices. AIM first generates a set of candidate dependency pairs from the spans. AIM then represents the status of each cloud service with a multivariate time series aggregated from the spans. With the representation of services, AIM calculates the similarities between the statuses of the caller and callee of each candidate pair. Finally, AIM aggregates the similarities to produce a unified value as the intensity of the dependency. We evaluate AIM on the data collected from an open-source microservice benchmark and a cloud system in production. The experimental results show that AIM can efficiently and accurately predict the intensity of dependencies. We further demonstrate the usefulness of our method in a large-scale cloud system. We plan to release both datasets to facilitate future studies.

Wed 17 Nov

22:00 - 23:00
