Evaluating the impact of falsely detected performance bug-inducing changes in JIT models
Wed 11 May 2022 22:05 - 22:10 at ICSE room 1-even hours - Requirements and More Chair(s): Cecile Peraire
Performance bugs bear a heavy cost on both software developers and end-users. Tools to reduce the occurrence, impact, and repair time of performance bugs, can therefore provide key assistance for software developers racing to fix these bugs. Classification models that focus on identifying defect-prone commits, referred to as \emph{Just-In-Time (JIT) Quality Assurance} are known to be useful in allowing developers to review risky commits. These commits can be reviewed while they are still fresh in developers’ minds, reducing the costs of developing high-quality software. JIT models, however, leverage the SZZ approach to identify whether or not a change is bug-inducing. The fixes to performance bugs may be scattered across the source code, separated from their bug-inducing locations. The nature of performance bugs may make SZZ a sub-optimal approach for identifying their bug-inducing commits. Yet, prior studies that leverage or evaluate the SZZ approach do not distinguish performance bugs from other bugs, leading to potential bias in the results.
In this paper, we conduct an empirical study on the JIT defect prediction for performance bugs. We concentrate on SZZ’s ability to identify the bug-inducing commits of performance bugs in two open-source projects, Cassandra, and Hadoop. We verify whether the bug-inducing commits found by SZZ are truly bug-inducing commits by manually examining these identified commits. Our manual examination includes cross referencing fix commits and JIRA bug reports. We evaluate model performance for JIT models by using them to identify bug-inducing code commits for performance related bugs. Our findings show that JIT defect prediction classifies non-performance bug-inducing commits better than performance bug-inducing commits, i.e., the SZZ approach does introduce errors when identifying bug-inducing commits. However, we find that manually correcting these errors in the training data only slightly improves the models. In the absence of a large number of correctly labelled performance bug-inducing commits, our findings show that combining all available training data (i.e., truly performance bug-inducing commits, non-performance bug-inducing commits, and non-bug-inducing commits) yields the best classification results.
Wed 11 MayDisplayed time zone: Eastern Time (US & Canada) change
11:00 - 12:00 | Performance and ReliabilityTechnical Track / Journal-First Papers at ICSE room 2-odd hours Chair(s): Andrea Zisman The Open University | ||
11:00 5mTalk | Predicting unstable software benchmarks using static source code features Journal-First Papers Christoph Laaber Simula Research Laboratory, Mikael Basmaci University of Zurich, Pasquale Salza University of Zurich Link to publication DOI Media Attached | ||
11:05 5mTalk | Evaluating the impact of falsely detected performance bug-inducing changes in JIT models Journal-First Papers Sophia Quach Concordia University, Maxime Lamothe Polytechnique Montréal, Bram Adams Queens University, Yasutaka Kamei Kyushu University, Weiyi Shang Concordia University Link to publication DOI Pre-print Media Attached | ||
11:10 5mTalk | Using Reinforcement Learning for Load Testing of Video Games Technical Track Rosalia Tufano Università della Svizzera Italiana, Simone Scalabrino University of Molise, Luca Pascarella Università della Svizzera italiana (USI), Emad Aghajani Software Institute, USI Università della Svizzera italiana, Rocco Oliveto University of Molise, Gabriele Bavota Software Institute, USI Università della Svizzera italiana Pre-print Media Attached | ||
11:15 5mTalk | EAGLE: Creating Equivalent Graphs to Test Deep Learning Libraries Technical Track Jiannan Wang Purdue University, Thibaud Lutellier University of Waterloo, Shangshu Qian Purdue University, Hung Viet Pham University of Waterloo, Lin Tan Purdue University Pre-print Media Attached | ||
11:20 5mTalk | Decomposing Software Verification into Off-the-Shelf Components: An Application to CEGAR Technical Track Dirk Beyer LMU Munich, Germany, Jan Haltermann University of Oldenburg, Thomas Lemberger LMU Munich, Heike Wehrheim Carl von Ossietzky Universität Oldenburg / University of Oldenburg Pre-print Media Attached | ||
11:25 5mTalk | Precise Divide-By-Zero Detection with Affirmative Evidence Technical Track Yiyuan Guo The Hong Kong University of Science and Technology, Ant Group, Jinguo Zhou Ant Group, Peisen Yao The Hong Kong University of Science and Technology, Qingkai Shi Ant Group, Charles Zhang Hong Kong University of Science and Technology DOI Pre-print Media Attached |