Anomaly Detection in a Large-scale Cloud Platform (ICSE 2021 - SEIP - Software Engineering in Practice)

Who

Mohammad Saiful Islam, William Pourmajidi, Lei Zhang, John Steinbacher, Tony Erwin, Andriy Miranskyy

Track

ICSE 2021 SEIP - Software Engineering in Practice

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 26 May 2021 19:10 - 19:30 at Blended Sessions Room 4 - 2.5.4. Some Big Companies' Practices: Cases at Facebook, Google & IBM Chair(s): Davide Falessi
Thu 27 May 2021 07:10 - 07:30 at Blended Sessions Room 4 - 2.5.4. Some Big Companies' Practices: Cases at Facebook, Google & IBM

Abstract

Cloud computing is ubiquitous: more and more companies are moving the workloads into the Cloud. However, this rise in popularity challenges Cloud service providers, as they need to monitor the quality of their ever-growing offerings effectively. To address the challenge, we designed and implemented an automated monitoring system for the IBM Cloud Platform. This monitoring system utilizes deep learning neural networks to detect anomalies in near-real-time in multiple Platform components simultaneously.

After running the system for a year, we observed that the proposed solution frees the DevOps team’s time and human resources from manually monitoring thousands of Cloud components. Moreover, it increases customer satisfaction by reducing the risk of Cloud outages.

In this paper, we share our solutions’ architecture, implementation notes, and best practices that emerged while evolving the monitoring system. They can be leveraged by other researchers and practitioners to build anomaly detectors for complex systems.

Link to Preprint

https://arxiv.org/abs/2010.10966

Mohammad Saiful Islam

Ryerson University

Canada

William Pourmajidi

Ryerson University

Canada

Lei Zhang

Ryerson University

Canada

John Steinbacher

IBM

Canada

Tony Erwin

IBM

United States

Andriy Miranskyy

Ryerson University

Canada

YT video

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 26 May
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

18:50 - 19:50	2.5.4. Some Big Companies' Practices: Cases at Facebook, Google & IBMSEIP - Software Engineering in Practice / Technical Track at Blended Sessions Room 4 +12h Chair(s): Davide Falessi California Polytechnic State University

18:50 20m Paper		Testing Web Enabled Simulation at Scale Using Metamorphic TestingSEIP SEIP - Software Engineering in Practice Mark Harman Facebook, Inc., John Ahlgren Facebook, Maria Eugenia Berezin Facebook, Elena Dulskyte Facebook, Inna Dvortsova Facebook, Johann George Facebook, Natalija Gucevska Facebook, Erik Meijer , Justin Spahr-Summers Facebook, Kinga Bojarczuk Facebook, Silvia Sapora Facebook, Maria Lomeli Facebook Pre-print Media Attached
19:10 20m Paper		Anomaly Detection in a Large-scale Cloud PlatformSEIP SEIP - Software Engineering in Practice Mohammad Saiful Islam Ryerson University, William Pourmajidi Ryerson University, Lei Zhang Ryerson University, John Steinbacher IBM, Tony Erwin IBM, Andriy Miranskyy Ryerson University Pre-print Media Attached
19:30 20m Paper		Smart Build Targets Batching Service at GoogleSEIP SEIP - Software Engineering in Practice Kaiyuan Wang Google, USA, Daniel Rall Google, Greg Tener Google, Vijay Gullapalli Google, Xin Huang , Ahmed Gad Google Pre-print Media Attached

Thu 27 May
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

06:50 - 07:50	2.5.4. Some Big Companies' Practices: Cases at Facebook, Google & IBMSEIP - Software Engineering in Practice / Technical Track at Blended Sessions Room 4

06:50 20m Paper		Testing Web Enabled Simulation at Scale Using Metamorphic TestingSEIP SEIP - Software Engineering in Practice Mark Harman Facebook, Inc., John Ahlgren Facebook, Maria Eugenia Berezin Facebook, Elena Dulskyte Facebook, Inna Dvortsova Facebook, Johann George Facebook, Natalija Gucevska Facebook, Erik Meijer , Justin Spahr-Summers Facebook, Kinga Bojarczuk Facebook, Silvia Sapora Facebook, Maria Lomeli Facebook Pre-print Media Attached
07:10 20m Paper		Anomaly Detection in a Large-scale Cloud PlatformSEIP SEIP - Software Engineering in Practice Mohammad Saiful Islam Ryerson University, William Pourmajidi Ryerson University, Lei Zhang Ryerson University, John Steinbacher IBM, Tony Erwin IBM, Andriy Miranskyy Ryerson University Pre-print Media Attached
07:30 20m Paper		Smart Build Targets Batching Service at GoogleSEIP SEIP - Software Engineering in Practice Kaiyuan Wang Google, USA, Daniel Rall Google, Greg Tener Google, Vijay Gullapalli Google, Xin Huang , Ahmed Gad Google Pre-print Media Attached