Reproducing Performance Bug Reports in Server Applications: The Researchers' Experiences
Software performance is critical to the quality of the software system. Performance bugs can cause significant performance degradation such as long response time and low system throughput that ultimately lead to poor user experiences. Many modern software projects use bug tracking systems that allow developers and users to report issues they have identified in the software. While bug reports are intended to help developers to understand and fix bugs, they are also extensively used by researchers for finding benchmarks to evaluate their testing and debugging approaches. Researchers often rely on the description of a confirmed performance bug report to reproduce the performance bug to be used in their evaluation. Although researchers spend a considerable amount of time and effort in finding usable performance bugs from bug repositories, they often get only a few usable performance bugs. Reproducing performance bugs is a difficult task even for domain experts such as developers. Compared to functional bugs, performance bugs are substantially more complicated to reproduce because they often manifest through large inputs and specific execution conditions. The amount of information disclosed in a bug report may not always be sufficient to reproduce the performance bug for researchers, and thus hinders the usability of bug repository as the resource for finding benchmarks. Our study targets reproducing performance bugs from the perspectives of non-domain experts such as software engineering researchers. One big difference compared to the prior work is that we specifically target confirmed performance bugs to report why software engineering researchers may not succeed in reproducing such bugs rather than understanding and characterizing non- reproducible bugs from the viewpoints of developers. Therefore, a failed-to-reproduce performance bug in this work is defined as a developer confirmed reproducible performance bug that cannot be reproduced by researchers due to the lack of domain knowledge or environment limitations. The goal of this study is to share our experience as software engineering researchers in reproducing performance bugs through investigating the impact of different factors identified in confirmed performance bug reports in open-source projects. We studied the characteristics of confirmed performance bugs by reproducing them using only information available from the bug report to examine the challenges of performance bug reproduction. We spent more than 800 hours over the course of six months to study and reproduce 93 confirmed performance bugs, which are randomly sampled from two large-scale open-source server applications. We 1) studied the characteristics of the reproduced performance bug reports; 2) summarized the causes of failed-to-reproduce confirmed performance bug reports; 3) shared our experience on suggesting workarounds to improve the bug reproduction success rate; 4) delivered a virtual machine image that contains a set of 17 ready-to-execute performance bug benchmarks. The findings of our study provide guidance and a set of suggestions to help researchers to understand, evaluate, and successfully reproduce performance bugs. We also provided a set of implications for both researchers and practitioners on developing techniques for testing and diagnosing performance bugs, improving the quality of bug reports, and detecting failed-to-reproduce bug reports.
Wed 23 SepDisplayed time zone: (UTC) Coordinated Universal Time change
17:10 - 18:10
|Code to Comment "Translation": Data, Metrics, Baselining & Evaluation|
|Reproducing Performance Bug Reports in Server Applications: The Researchers' Experiences|
Xue Han University of Kentucky, Daniel Carroll University of Kentucky, Tingting Yu University of KentuckyLink to publication DOI
|Exploring the Architectural Impact of Possible Dependencies in Python software|