ConEx: Efficient Exploration of Big-Data System Configurations for Better Performance (ICSE 2021 - Journal-First Papers)

Who

Rahul Krishna, Chong Tang, Kevin Sullivan, Baishakhi Ray

Track

ICSE 2021 Journal-First Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 28 May 2021 15:45 - 16:05 at Blended Sessions Room 2 - 4.3.2. Performance Modeling of Highly Configurable Software Systems Chair(s): Carolyn Seaman
Sat 29 May 2021 03:45 - 04:05 at Blended Sessions Room 2 - 4.3.2. Performance Modeling of Highly Configurable Software Systems

Abstract

Configuration space complexity makes the big-data software systems hard to configure well. Consider Hadoop, with over nine hundred parameters, developers often just use the default configurations provided with Hadoop distributions. The opportunity costs in lost performance are significant. Popular learning-based approaches to auto-tune software does not scale well for big-data systems because of the high cost of collecting training data. We present a new method based on a combination of Evolutionary Markov Chain Monte Carlo (EMCMC) sampling and cost reduction techniques to cost-effectively find better-performing configurations for big data systems. For cost reduction, we developed and experimentally tested and validated two approaches: using scaled-up big data jobs as proxies for the objective function for larger jobs and using a dynamic job similarity measure to infer that results obtained for one kind of big data problem will work well for similar problems. Our experimental results suggest that our approach promises to significantly improve the performance of big data systems and that it outperforms competing approaches based on random sampling, basic genetic algorithms (GA), and predictive model learning. Our experimental results support the conclusion that our approach has strongly demonstrated potential to significantly and cost-effectively improve the performance of big data systems.

Link to Publication

https://doi.ieeecomputersociety.org/10.1109/TSE.2020.3007560

Link to Preprint

https://arxiv.org/abs/1910.09644

DOI

https://doi.org/10.1109/TSE.2020.3007560

Rahul Krishna

Columbia University, USA

Chong Tang

Microsoft

Kevin Sullivan

University of Virginia

Baishakhi Ray

Columbia University, USA

United States

YT Video