Machine Learning meets Software Performance: Optimization, Transfer Learning, and Counterfactual Causal Inference (ASE 2020 - Tutorials)

Track

ASE 2020 Tutorials

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 25 Sep 2020 16:00 - 17:50 at Platypus - Tutorial 4: ML and SE

Abstract

A wide range of modern software-intensive systems (e.g., autonomous systems, big data analytics, robotics, deep neural architectures) is built configurable. These highly-configurable systems offer a rich space for adaptation to different domains and tasks. Developers and users often need to reason about the performance of such systems, making tradeoffs to change specific quality attributes or detecting performance anomalies. For instance, the developers of image recognition mobile apps are not only interested in learning which deep neural architectures are accurate enough to classify their images correctly, but also which architectures consume the least power on the mobile devices on which they are deployed. Recent research has focused on models built from performance measurements obtained by instrumenting the system. However, the fundamental problem is that the learning techniques for building a reliable performance model do not scale well, simply because the configuration space of systems is exponentially large that is impossible to exhaustively explore. For example, it will take over 60 years to explore the whole configuration space of a system with 25 binary options.

In this tutorial, I will start motivating the configuration space explosion problem based on my previous experience with large-scale big data systems in the industry. I will then present transfer learning as well as other machine learning techniques including multi-objective Bayesian optimization to tackle the sample efficiency challenge: instead of taking the measurements from the real system, we learn the performance model using samples from cheap sources, such as simulators that approximate the performance of the real system, with a fair fidelity and at a low cost. Results show that despite the high cost of measurement on the real system, learning performance models can become surprisingly cheap as long as certain properties are reused across environments. In the second half of the talk, I will present empirical evidence, which lays a foundation for a theory explaining why and when transfer learning works by showing the similarities of performance behavior across environments. I will present observations of environmental changes’ impacts (such as changes to hardware, workload, and software versions) for a selected set of configurable systems from different domains to identify the key elements that can be exploited for transfer learning. These observations demonstrate a promising path for building efficient, reliable, and dependable software systems as well as theoretically sound approaches for tackling performance optimization, testing, and debugging. Finally, I will share some promising and potential research directions including our recent progress on a performance debugging approach based on counterfactual causal inference.

Outline

Background on computer system performance
Case study: A composable highly-configurable system
Performance analysis and optimization
Transfer learning for performance analysis and optimization
Research directions 1: Cost-aware multi-objective Bayesian optimization for MLSys
Research directions 2: Counterfactual causal inference for performance debugging

Target audience

This tutorial is targeted for practitioners as well as researchers that would like to go deeper into understanding new and potentially powerful approaches for modern highly-configurable systems. This tutorial will be also suitable for students (both undergraduate and graduate) who want to learn about potential research directions and how they can find a niche and fruitful area in research at the intersections of machine learning, systems, and software engineering.

Bio

Pooyan Jamshidi is an Assistant Professor at the University of South Carolina. He directs the AISys Lab, where he investigates the development of novel algorithmic and theoretically principled methods for machine learning systems. Prior to his current position, he was a research associate at Carnegie Mellon University and Imperial College London, where he primarily worked on transfer learning for performance understanding of highly-configurable systems including robotics and big data systems. Pooyan’s general research interests are at the intersection of systems/software and machine learning. He received his Ph.D. in Computer Science at Dublin City University in 2014, and M.S. and B.S. degrees in Computer Science and Math from the Amirkabir University of Technology in 2003 and 2006 respectively.