How to Train Your Neural Bug Detector: Artificial vs Real Bugs
Real bug fixes found in open source repositories seem to be the perfect source for learning to localize and repair real bugs. Yet, the scale of existing bug fix collections is typically too small for training data-intensive neural approaches. Neural bug detectors are hence almost exclusively trained on artificial bugs, produced by mutating existing source code and thus easily obtainable at large scales. However, neural bug detectors trained on artificial bugs usually underperform when faced with real bugs. To address this shortcoming, we set out to explore the impact of training on real bug fixes at scale. Our systematic study compares neural bug detectors trained on real bug fixes, artificial bugs and mixtures of real and artificial bugs at various dataset scales and with varying training techniques. Based on our insights gained from training on a novel dataset of 33k real bug fixes, we were able to identify a training setting capable of significantly improving the performance of existing neural bug detectors by up to 170% on simple bugs in Python. In addition, our evaluation shows that further gains can be expected by increasing the size of the real bug fix dataset or the code dataset used for generating artificial bugs. To facilitate future research on neural bug detection, we release our real bug fix dataset, trained models and code.
Presentation (ASE23-HowToNBD_Cedric_Richter.pptx) | 2.94MiB |
Wed 13 SepDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
15:30 - 17:00 | Bug DetectionResearch Papers / Journal-first Papers at Room D Chair(s): Andreea Vescan Babes-Bolyai University | ||
15:30 12mTalk | A Comparative Study of Transformer-based Neural Text Representation Techniques on Bug Triaging Research Papers File Attached | ||
15:42 12mTalk | Duplicate Bug Report Detection: How Far Are We? Journal-first Papers Ting Zhang Singapore Management University, DongGyun Han Royal Holloway, University of London, Venkatesh Vinayakarao Chennai Mathematical Institute, Ivana Clairine Irsan Singapore Management University, Bowen Xu North Carolina State University, Ferdian Thung Singapore Management University, David Lo Singapore Management University, Lingxiao Jiang Singapore Management University Link to publication DOI File Attached | ||
15:54 12mTalk | Neural SZZ Algorithm Research Papers LingXiao Tang zhejiang university, Lingfeng Bao Zhejiang University, Xin Xia Huawei Technologies, Zhongdong Huang Zhejiang University Pre-print | ||
16:06 12mTalk | How to Train Your Neural Bug Detector: Artificial vs Real Bugs Research Papers Cedric Richter Carl von Ossietzky Universität Oldenburg / University of Oldenburg, Heike Wehrheim Carl von Ossietzky Universität Oldenburg / University of Oldenburg Pre-print File Attached | ||
16:18 12mTalk | Detection of Java Basic Thread Misuses Based on Static Event Analysis Research Papers Baoquan Cui Institute of Software at Chinese Academy of Sciences, China, MiaoMiao Wang Technology Center of Software Engineering, ISCAS, China. University of Chinese Academy of Sciences, China., Chi Zhang State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China, Jiwei Yan Institute of Software at Chinese Academy of Sciences, China, Jun Yan Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Jian Zhang Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences File Attached | ||
16:30 12mFull-paper | On effort-aware metrics for defect prediction Journal-first Papers Jonida Çarka University of Rome Tor Vergata, Matteo Esposito University of Rome Tor Vergata, Falessi Davide University of Rome Tor Vergata DOI File Attached | ||
16:42 12mTalk | FLUX: Finding Bugs with LLVM IR Based Unit Test Crossovers Research Papers Eric Liu University of Toronto, Shengjie Xu University of Toronto, David Lie University of Toronto, Canada Pre-print File Attached |