Despite all efforts to avoid bugs, software sometimes crashes in the field, leaving crash traces as the only information to localize the problem. Prior approaches on localizing where to fix the root cause of a crash do not scale well to ultra-large scale, heterogeneous code bases that contain millions of code files written in multiple programming languages. This paper presents Scaffle, the first scalable bug localization technique, which is based on the key insight to divide the problem into two easier sub-problems. First, a trained machine learning model predicts which lines of a raw crash trace are most informative for localizing the bug. Then, these lines are fed to an information retrieval-based search engine to retrieve file paths in the code base, predicting which file to change to address the crash. The approach does not make any assumptions about the format of a crash trace or the language that produces it. We evaluate Scaffle with tens of thousands of crash traces produced by a large-scale industrial code base that contains millions of possible bug locations and that powers tools used by billions of people. The results show that the approach correctly predicts the file to fix for 40% to 60% (50% to 70%) of all crash traces within the top-1 (top-5) predictions. Moreover, Scaffle improves over several baseline approaches, including an existing classification-based approach, a scalable variant of existing information retrieval-based approaches, and a set of hand-tuned, industrially deployed heuristics.
Tue 21 JulDisplayed time zone: Tijuana, Baja California change
12:10 - 13:10 | BUG LOCALIZATION AND TEST ISOLATION Technical Papers at Zoom Chair(s): Mattia Fazzini University of Minnesota Public Live Stream/Recording. Registered participants should join via the Zoom link distributed in Slack. | ||
12:10 20mTalk | Scaffle: Bug Localization on Millions of Files Technical Papers Michael Pradel University of Stuttgart, Vijayaraghavan Murali Facebook, Inc., Rebecca Qian Facebook, Inc., Mateusz Machalica Facebook, Inc., Erik Meijer , Satish Chandra Facebook DOI Media Attached | ||
12:30 20mTalk | Abstracting Failure-Inducing Inputs Technical Papers Rahul Gopinath CISPA Helmholtz Center for Information Security, Alexander Kampmann CISPA Helmholtz Center for Information Security, Nikolas Havrikov CISPA Helmholtz Center for Information Security, Ezekiel Soremekun CISPA Helmholtz Center for Information Security, Andreas Zeller CISPA Helmholtz Center for Information Security DOI Pre-print Media Attached | ||
12:50 20mTalk | Debugging the Performance of Maven’s Test Isolation: Experience Report Technical Papers Pengyu Nie The University of Texas at Austin, Ahmet Celik Facebook, Matthew Coley , Aleksandar Milicevic , Jonathan Bell Northeastern University, Milos Gligoric The University of Texas at Austin DOI |