Predicting unstable software benchmarks using static source code features
Fri 13 May 2022 03:00 - 03:05 at ICSE room 2-odd hours - Evaluation and Performance Chair(s): Massimiliano Di Penta
Thu 26 May 2022 11:20 - 11:25 at Room 304+305 - Papers 13: Program Repair and Performance Chair(s): Lars Grunske
Software benchmarks are only as good as the performance measurements they yield. Unstable benchmarks show high variability among repeated measurements, which causes uncertainty about the actual performance and complicates reliable change assessment. However, if a benchmark is stable or unstable only becomes evident after it has been executed and its results are available. In this paper, we introduce a machine-learning-based approach to predict a benchmark’s stability without having to execute it. Our approach relies on 58 statically-computed source code features, extracted for benchmark code and code called by a benchmark, related to (1) meta information, e.g., lines of code (LOC), (2) programming language elements, e.g., conditionals or loops, and (3) potentially performance-impacting standard library calls, e.g., file and network input/output (I/O). To assess our approach’s effectiveness, we perform a large-scale experiment on 4,461 Go benchmarks coming from 230 open-source software (OSS) projects. First, we assess the prediction performance of our machine learning models using 11 binary classification algorithms. We find that Random Forest performs best with good prediction performance from 0.79 to 0.90, and 0.43 to 0.68, in terms of AUC and MCC, respectively. Second, we perform feature importance analyses for individual features and feature categories. We find that 7 features related to meta-information, slice usage, nested loops, and synchronization application programming interfaces (APIs) are individually important for good predictions; and that the combination of all features of the called source code is paramount for our model, while the combination of features of the benchmark itself is less important. Our results show that although benchmark stability is affected by more than just the source code, we can effectively utilize machine learning models to predict whether a benchmark will be stable or not ahead of execution. This enables spending precious testing time on reliable benchmarks, supporting developers to identify unstable benchmarks during development, allowing unstable benchmarks to be repeated more often, estimating stability in scenarios where repeated benchmark execution is infeasible or impossible, and warning developers if new benchmarks or existing benchmarks executed in new environments will be unstable.
Wed 11 MayDisplayed time zone: Eastern Time (US & Canada) change
11:00 - 12:00 | Performance and ReliabilityTechnical Track / Journal-First Papers at ICSE room 2-odd hours Chair(s): Andrea Zisman The Open University | ||
11:00 5mTalk | Predicting unstable software benchmarks using static source code features Journal-First Papers Christoph Laaber Simula Research Laboratory, Mikael Basmaci University of Zurich, Pasquale Salza University of Zurich Link to publication DOI Media Attached | ||
11:05 5mTalk | Evaluating the impact of falsely detected performance bug-inducing changes in JIT models Journal-First Papers Sophia Quach Concordia University, Maxime Lamothe Polytechnique Montréal, Bram Adams Queens University, Yasutaka Kamei Kyushu University, Weiyi Shang Concordia University Link to publication DOI Pre-print Media Attached | ||
11:10 5mTalk | Using Reinforcement Learning for Load Testing of Video Games Technical Track Rosalia Tufano Università della Svizzera Italiana, Simone Scalabrino University of Molise, Luca Pascarella Università della Svizzera italiana (USI), Emad Aghajani Software Institute, USI Università della Svizzera italiana, Rocco Oliveto University of Molise, Gabriele Bavota Software Institute, USI Università della Svizzera italiana Pre-print Media Attached | ||
11:15 5mTalk | EAGLE: Creating Equivalent Graphs to Test Deep Learning Libraries Technical Track Jiannan Wang Purdue University, Thibaud Lutellier University of Waterloo, Shangshu Qian Purdue University, Hung Viet Pham University of Waterloo, Lin Tan Purdue University Pre-print Media Attached | ||
11:20 5mTalk | Decomposing Software Verification into Off-the-Shelf Components: An Application to CEGAR Technical Track Dirk Beyer LMU Munich, Germany, Jan Haltermann University of Oldenburg, Thomas Lemberger LMU Munich, Heike Wehrheim Carl von Ossietzky Universität Oldenburg / University of Oldenburg Pre-print Media Attached | ||
11:25 5mTalk | Precise Divide-By-Zero Detection with Affirmative Evidence Technical Track Yiyuan Guo The Hong Kong University of Science and Technology, Ant Group, Jinguo Zhou Ant Group, Peisen Yao The Hong Kong University of Science and Technology, Qingkai Shi Ant Group, Charles Zhang Hong Kong University of Science and Technology DOI Pre-print Media Attached |
Fri 13 MayDisplayed time zone: Eastern Time (US & Canada) change
Thu 26 MayDisplayed time zone: Eastern Time (US & Canada) change
11:00 - 12:30 | Papers 13: Program Repair and PerformanceTechnical Track / Journal-First Papers at Room 304+305 Chair(s): Lars Grunske Humboldt-Universität zu Berlin | ||
11:00 5mTalk | Trust Enhancement Issues in Program Repair Technical Track Yannic Noller National University of Singapore, Ridwan Salihin Shariffdeen National University of Singapore, Xiang Gao Beihang University, China, Abhik Roychoudhury National University of Singapore Pre-print Media Attached | ||
11:05 5mTalk | DEAR: A Novel Deep Learning-based Approach for Automated Program Repair Technical Track Yi Li New Jersey Institute of Technology, Shaohua Wang New Jersey Institute of Technology, Tien N. Nguyen University of Texas at Dallas Pre-print | ||
11:10 5mTalk | Neural Program Repair using Execution-based Backpropagation Technical Track He Ye KTH Royal Institute of Technology, Matias Martinez University of Valenciennes, Martin Monperrus KTH Royal Institute of Technology Pre-print Media Attached | ||
11:15 5mTalk | PropR: Property-Based Automatic Program Repair Technical Track Matthías Páll Gissurarson Chalmers University of Technology, Sweden, Leonhard Applis Delft University of Technology, Annibale Panichella Delft University of Technology, Arie van Deursen Delft University of Technology, Netherlands, Dave Sands Chalmers DOI Pre-print Media Attached | ||
11:20 5mTalk | Predicting unstable software benchmarks using static source code features Journal-First Papers Christoph Laaber Simula Research Laboratory, Mikael Basmaci University of Zurich, Pasquale Salza University of Zurich Link to publication DOI Media Attached | ||
11:25 5mTalk | Using Reinforcement Learning for Load Testing of Video Games Technical Track Rosalia Tufano Università della Svizzera Italiana, Simone Scalabrino University of Molise, Luca Pascarella Università della Svizzera italiana (USI), Emad Aghajani Software Institute, USI Università della Svizzera italiana, Rocco Oliveto University of Molise, Gabriele Bavota Software Institute, USI Università della Svizzera italiana Pre-print Media Attached | ||
11:30 5mTalk | On Debugging the Performance of Configurable Software Systems: Developer Needs and Tailored Tool Support Technical Track Miguel Velez Carnegie Mellon University, Pooyan Jamshidi University of South Carolina, Norbert Siegmund Leipzig University, Sven Apel Saarland University, Christian Kästner Carnegie Mellon University Pre-print Media Attached | ||
11:35 5mTalk | Adaptive Performance Anomaly Detection for Online Service Systems via Pattern Sketching Technical Track Zhuangbin Chen Chinese University of Hong Kong, China, Jinyang Liu , Yuxin Su Sun Yat-sen University, Hongyu Zhang University of Newcastle, Xiao Ling Huawei Technologies, Yongqiang Yang Huawei Technologies, Michael Lyu The Chinese University of Hong Kong Pre-print Media Attached |