Write a Blog >>
ICSE 2021
Mon 17 May - Sat 5 June 2021

Configuration space complexity makes the big-data software systems hard to configure well. Consider Hadoop, with over nine hundred parameters, developers often just use the default configurations provided with Hadoop distributions. The opportunity costs in lost performance are significant. Popular learning-based approaches to auto-tune software does not scale well for big-data systems because of the high cost of collecting training data. We present a new method based on a combination of Evolutionary Markov Chain Monte Carlo (EMCMC) sampling and cost reduction techniques to cost-effectively find better-performing configurations for big data systems. For cost reduction, we developed and experimentally tested and validated two approaches: using scaled-up big data jobs as proxies for the objective function for larger jobs and using a dynamic job similarity measure to infer that results obtained for one kind of big data problem will work well for similar problems. Our experimental results suggest that our approach promises to significantly improve the performance of big data systems and that it outperforms competing approaches based on random sampling, basic genetic algorithms (GA), and predictive model learning. Our experimental results support the conclusion that our approach has strongly demonstrated potential to significantly and cost-effectively improve the performance of big data systems.

Conference Day
Fri 28 May

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

15:05 - 16:05
4.3.2. Performance Modeling of Highly Configurable Software SystemsTechnical Track / Journal-First Papers at Blended Sessions Room 2 +12h
Chair(s): Carolyn SeamanUniversity of Maryland Baltimore County
15:05
20m
Paper
White-Box Performance-Influence Models: A Profiling and Learning ApproachArtifact ReusableTechnical TrackArtifact Available
Technical Track
Max WeberLeipzig University, Sven ApelSaarland University, Norbert SiegmundLeipzig University
Pre-print Media Attached
15:25
20m
Paper
White-Box Analysis over Machine Learning: Modeling Performance of Configurable SystemsTechnical Track
Technical Track
Miguel VelezCarnegie Mellon University, Pooyan JamshidiUniversity of South Carolina, Norbert SiegmundLeipzig University, Sven ApelSaarland University, Christian KästnerCarnegie Mellon University
Pre-print Media Attached
15:45
20m
Paper
ConEx: Efficient Exploration of Big-Data System Configurations for Better PerformanceJournal-First
Journal-First Papers
Rahul KrishnaColumbia University, USA, Chong TangMicrosoft, Kevin SullivanUniversity of Virginia, Baishakhi RayColumbia University, USA
Link to publication DOI Pre-print Media Attached

Conference Day
Sat 29 May

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

03:05 - 04:05
4.3.2. Performance Modeling of Highly Configurable Software SystemsTechnical Track / Journal-First Papers at Blended Sessions Room 2
03:05
20m
Paper
White-Box Performance-Influence Models: A Profiling and Learning ApproachArtifact ReusableTechnical TrackArtifact Available
Technical Track
Max WeberLeipzig University, Sven ApelSaarland University, Norbert SiegmundLeipzig University
Pre-print Media Attached
03:25
20m
Paper
White-Box Analysis over Machine Learning: Modeling Performance of Configurable SystemsTechnical Track
Technical Track
Miguel VelezCarnegie Mellon University, Pooyan JamshidiUniversity of South Carolina, Norbert SiegmundLeipzig University, Sven ApelSaarland University, Christian KästnerCarnegie Mellon University
Pre-print Media Attached
03:45
20m
Paper
ConEx: Efficient Exploration of Big-Data System Configurations for Better PerformanceJournal-First
Journal-First Papers
Rahul KrishnaColumbia University, USA, Chong TangMicrosoft, Kevin SullivanUniversity of Virginia, Baishakhi RayColumbia University, USA
Link to publication DOI Pre-print Media Attached