Scaffle: Bug Localization on Millions of Files (ISSTA 2020 - Technical Papers)

Who

Michael Pradel, Vijayaraghavan Murali, Rebecca Qian, Mateusz Machalica, Erik Meijer, Satish Chandra

Track

ISSTA 2020 Technical Papers

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 21 Jul 2020 12:10 - 12:30 at Zoom - BUG LOCALIZATION AND TEST ISOLATION Chair(s): Mattia Fazzini

Abstract

Despite all efforts to avoid bugs, software sometimes crashes in the field, leaving crash traces as the only information to localize the problem. Prior approaches on localizing where to fix the root cause of a crash do not scale well to ultra-large scale, heterogeneous code bases that contain millions of code files written in multiple programming languages. This paper presents Scaffle, the first scalable bug localization technique, which is based on the key insight to divide the problem into two easier sub-problems. First, a trained machine learning model predicts which lines of a raw crash trace are most informative for localizing the bug. Then, these lines are fed to an information retrieval-based search engine to retrieve file paths in the code base, predicting which file to change to address the crash. The approach does not make any assumptions about the format of a crash trace or the language that produces it. We evaluate Scaffle with tens of thousands of crash traces produced by a large-scale industrial code base that contains millions of possible bug locations and that powers tools used by billions of people. The results show that the approach correctly predicts the file to fix for 40% to 60% (50% to 70%) of all crash traces within the top-1 (top-5) predictions. Moreover, Scaffle improves over several baseline approaches, including an existing classification-based approach, a scalable variant of existing information retrieval-based approaches, and a set of hand-tuned, industrially deployed heuristics.

DOI

https://doi.org/10.1145/3395363.3397356

Michael Pradel

University of Stuttgart

Germany

Vijayaraghavan Murali

Facebook, Inc.

United States

Rebecca Qian

Facebook, Inc.

Mateusz Machalica

Facebook, Inc.

United States

Erik Meijer

Satish Chandra

Facebook

United States

Video

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 21 Jul
Displayed time zone: Tijuana, Baja California change

12:10 - 13:10	BUG LOCALIZATION AND TEST ISOLATION Technical Papers at Zoom Chair(s): Mattia Fazzini University of Minnesota Public Live Stream/Recording. Registered participants should join via the Zoom link distributed in Slack.

12:10 20m Talk		Scaffle: Bug Localization on Millions of Files Technical Papers Michael Pradel University of Stuttgart, Vijayaraghavan Murali Facebook, Inc., Rebecca Qian Facebook, Inc., Mateusz Machalica Facebook, Inc., Erik Meijer , Satish Chandra Facebook DOI Media Attached
12:30 20m Talk		Abstracting Failure-Inducing Inputs Technical Papers Rahul Gopinath CISPA Helmholtz Center for Information Security, Alexander Kampmann CISPA Helmholtz Center for Information Security, Nikolas Havrikov CISPA Helmholtz Center for Information Security, Ezekiel Soremekun CISPA Helmholtz Center for Information Security, Andreas Zeller CISPA Helmholtz Center for Information Security DOI Pre-print Media Attached
12:50 20m Talk		Debugging the Performance of Maven’s Test Isolation: Experience Report Technical Papers Pengyu Nie The University of Texas at Austin, Ahmet Celik Facebook, Matthew Coley , Aleksandar Milicevic , Jonathan Bell Northeastern University, Milos Gligoric The University of Texas at Austin DOI