Predicting unstable software benchmarks using static source code features (ICSE 2022 - Journal-First Papers)

Write a Blog >>

Sun 8 - Fri 27 May 2022

Who

Christoph Laaber, Mikael Basmaci, Pasquale Salza

Track

ICSE 2022 Journal-First Papers

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 11 May 2022 11:00 - 11:05 at ICSE room 2 - Performance and Reliability Chair(s): Andrea Zisman
Fri 13 May 2022 03:00 - 03:05 at ICSE room 2 - Evaluation and Performance Chair(s): Massimiliano Di Penta
Thu 26 May 2022 11:20 - 11:25 at Room 304+305 - Papers 13: Program Repair and Performance Chair(s): Lars Grunske

Abstract

Software benchmarks are only as good as the performance measurements they yield. Unstable benchmarks show high variability among repeated measurements, which causes uncertainty about the actual performance and complicates reliable change assessment. However, if a benchmark is stable or unstable only becomes evident after it has been executed and its results are available. In this paper, we introduce a machine-learning-based approach to predict a benchmark’s stability without having to execute it. Our approach relies on 58 statically-computed source code features, extracted for benchmark code and code called by a benchmark, related to (1) meta information, e.g., lines of code (LOC), (2) programming language elements, e.g., conditionals or loops, and (3) potentially performance-impacting standard library calls, e.g., file and network input/output (I/O). To assess our approach’s effectiveness, we perform a large-scale experiment on 4,461 Go benchmarks coming from 230 open-source software (OSS) projects. First, we assess the prediction performance of our machine learning models using 11 binary classification algorithms. We find that Random Forest performs best with good prediction performance from 0.79 to 0.90, and 0.43 to 0.68, in terms of AUC and MCC, respectively. Second, we perform feature importance analyses for individual features and feature categories. We find that 7 features related to meta-information, slice usage, nested loops, and synchronization application programming interfaces (APIs) are individually important for good predictions; and that the combination of all features of the called source code is paramount for our model, while the combination of features of the benchmark itself is less important. Our results show that although benchmark stability is affected by more than just the source code, we can effectively utilize machine learning models to predict whether a benchmark will be stable or not ahead of execution. This enables spending precious testing time on reliable benchmarks, supporting developers to identify unstable benchmarks during development, allowing unstable benchmarks to be repeated more often, estimating stability in scenarios where repeated benchmark execution is infeasible or impossible, and warning developers if new benchmarks or existing benchmarks executed in new environments will be unstable.

Link to Publication

https://doi.org/10.1007/s10664-021-09996-y

DOI

https://doi.org/10.1007/s10664-021-09996-y

Christoph Laaber

Simula Research Laboratory

Norway

Mikael Basmaci

University of Zurich

Pasquale Salza

University of Zurich

Switzerland

Media

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 11 May
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:00	Performance and ReliabilityTechnical Track / Journal-First Papers at ICSE room 2 Chair(s): Andrea Zisman The Open University

5m Talk		Predicting unstable software benchmarks using static source code features Journal-First Papers Christoph Laaber Simula Research Laboratory, Mikael Basmaci University of Zurich, Pasquale Salza University of Zurich Link to publication DOI Media Attached
5m Talk		Evaluating the impact of falsely detected performance bug-inducing changes in JIT models Journal-First Papers Sophia Quach Concordia University, Maxime Lamothe Polytechnique Montréal, Bram Adams Queens University, Yasutaka Kamei Kyushu University, Weiyi Shang Concordia University Link to publication DOI Pre-print Media Attached
5m Talk		Using Reinforcement Learning for Load Testing of Video Games Technical Track Rosalia Tufano Università della Svizzera Italiana, Simone Scalabrino University of Molise, Luca Pascarella Università della Svizzera italiana (USI), Emad Aghajani Software Institute, USI Università della Svizzera italiana, Rocco Oliveto University of Molise, Gabriele Bavota Software Institute, USI Università della Svizzera italiana Pre-print Media Attached
5m Talk		EAGLE: Creating Equivalent Graphs to Test Deep Learning Libraries Technical Track Jiannan Wang Purdue University, Thibaud Lutellier University of Waterloo, Shangshu Qian Purdue University, Hung Viet Pham University of Waterloo, Lin Tan Purdue University Pre-print Media Attached
5m Talk		Decomposing Software Verification into Off-the-Shelf Components: An Application to CEGAR Technical Track Dirk Beyer LMU Munich, Germany, Jan Haltermann University of Oldenburg, Thomas Lemberger LMU Munich, Heike Wehrheim Carl von Ossietzky Universität Oldenburg / University of Oldenburg Pre-print Media Attached
5m Talk		Precise Divide-By-Zero Detection with Affirmative Evidence Technical Track Yiyuan Guo The Hong Kong University of Science and Technology, Ant Group, Jinguo Zhou Ant Group, Peisen Yao The Hong Kong University of Science and Technology, Qingkai Shi Ant Group, Charles Zhang Hong Kong University of Science and Technology DOI Pre-print Media Attached

Fri 13 May
Displayed time zone: Eastern Time (US & Canada) change

03:00 - 04:00	Evaluation and PerformanceJournal-First Papers / Technical Track / SEET - Software Engineering Education and Training at ICSE room 2 Chair(s): Massimiliano Di Penta University of Sannio, Italy

5m Talk		Predicting unstable software benchmarks using static source code features Journal-First Papers Christoph Laaber Simula Research Laboratory, Mikael Basmaci University of Zurich, Pasquale Salza University of Zurich Link to publication DOI Media Attached
5m Talk		Academic and Industry Training for Data Modelling: Ideas for Mutual Benefit SEET - Software Engineering Education and Training Daria Bogdanova Sitecore , Monique Snoeck Katholieke Universiteit Leuven Pre-print
5m Talk		Conflict-aware Inference of Python Compatible Runtime Environments with Domain Knowledge Graph Technical Track Wei Cheng Nanjing University, XiangRong Zhu Nanjing University, Wei Hu Nanjing University DOI Pre-print Media Attached
5m Talk		Utilizing Parallelism in Smart Contracts on Decentralized Blockchains by Taming Application-Inherent Conflicts Technical Track Péter Garamvölgyi Shanghai Tree-Graph Blockchain Research Institute, Yuxi Liu Duke University, Dong Zhou Tsinghua University, Fan Long Shanghai Tree-Graph Blockchain Research Institute, Ming Wu Shanghai Tree-Graph Blockchain Research Institute DOI Pre-print Media Attached

Thu 26 May
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30	Papers 13: Program Repair and PerformanceTechnical Track / Journal-First Papers at Room 304+305 Chair(s): Lars Grunske Humboldt-Universität zu Berlin

11:00 5m Talk		Trust Enhancement Issues in Program Repair Technical Track Yannic Noller National University of Singapore, Ridwan Salihin Shariffdeen National University of Singapore, Xiang Gao Beihang University, China, Abhik Roychoudhury National University of Singapore Pre-print Media Attached
11:05 5m Talk		DEAR: A Novel Deep Learning-based Approach for Automated Program Repair Technical Track Yi Li New Jersey Institute of Technology, Shaohua Wang New Jersey Institute of Technology, Tien N. Nguyen University of Texas at Dallas Pre-print
11:10 5m Talk		Neural Program Repair using Execution-based Backpropagation Technical Track He Ye KTH Royal Institute of Technology, Matias Martinez University of Valenciennes, Martin Monperrus KTH Royal Institute of Technology Pre-print Media Attached
11:15 5m Talk		PropR: Property-Based Automatic Program Repair Technical Track Matthías Páll Gissurarson Chalmers University of Technology, Sweden, Leonhard Applis Delft University of Technology, Annibale Panichella Delft University of Technology, Arie van Deursen Delft University of Technology, Netherlands, Dave Sands Chalmers DOI Pre-print Media Attached
11:20 5m Talk		Predicting unstable software benchmarks using static source code features Journal-First Papers Christoph Laaber Simula Research Laboratory, Mikael Basmaci University of Zurich, Pasquale Salza University of Zurich Link to publication DOI Media Attached
11:25 5m Talk		Using Reinforcement Learning for Load Testing of Video Games Technical Track Rosalia Tufano Università della Svizzera Italiana, Simone Scalabrino University of Molise, Luca Pascarella Università della Svizzera italiana (USI), Emad Aghajani Software Institute, USI Università della Svizzera italiana, Rocco Oliveto University of Molise, Gabriele Bavota Software Institute, USI Università della Svizzera italiana Pre-print Media Attached
11:30 5m Talk		On Debugging the Performance of Configurable Software Systems: Developer Needs and Tailored Tool Support Technical Track Miguel Velez Carnegie Mellon University, Pooyan Jamshidi University of South Carolina, Norbert Siegmund Leipzig University, Sven Apel Saarland University, Christian Kästner Carnegie Mellon University Pre-print Media Attached
11:35 5m Talk		Adaptive Performance Anomaly Detection for Online Service Systems via Pattern Sketching Technical Track Zhuangbin Chen Chinese University of Hong Kong, China, Jinyang Liu , Yuxin Su Sun Yat-sen University, Hongyu Zhang University of Newcastle, Xiao Ling Huawei Technologies, Yongqiang Yang Huawei Technologies, Michael Lyu The Chinese University of Hong Kong Pre-print Media Attached

Information for Participants

Wed 11 May 2022 11:00 - 12:00 at ICSE room 2 - Performance and Reliability Chair(s): Andrea Zisman

Info for room ICSE room 2-odd hours:

Click here to go to the room on Midspace

Fri 13 May 2022 03:00 - 04:00 at ICSE room 2 - Evaluation and Performance Chair(s): Massimiliano Di Penta

Info for room ICSE room 2-odd hours:

Click here to go to the room on Midspace