Are Neural Bug Detectors Comparable to Software Developers on Variable Misuse Bugs?
Debugging, that is, identifying and fixing bugs in software, is a central part of software development. Developers are therefore often confronted with the task of deciding whether a given code snippet contains a bug, and if yes, where. Recently, data-driven methods have been employed to learn this task of bug detection, resulting (amongst others) in so called neural bug detectors. Neural bug detectors are trained on millions of buggy and correct code snippets.
Given the “neural learning” procedure, it seems likely that neural bug detectors – on the specific task of finding bugs – have a performance similar to human software developers. For this work, we set out to substantiate or refute such a hypothesis. We report on the results of an empirical study with over 100 software developers, targeting the comparison of humans and neural bug detectors. As detection task, we chose a specific form of bugs (variable misuse bugs) for which neural bug detectors have recently made significant progress.
Our study shows that despite the fact that neural bug detectors see millions of such examples during training, software developers – when conducting bug detection as a majority decision – are slightly better than neural bug detectors. Altogether, we find a large overlap in the performance, both for classifying code as buggy and for localizing the buggy line in the code.
In comparison to developers, one of the two evaluated neural bug detectors, however, raises a higher number of false alarms.