TSSB-3M: Mining single statement bugs at massive scale
Mon 23 May 2022 11:46 - 11:54 at Room 315+316 - Blended Technical Session 1 (Integration, Large-scale mining, and Software Ecosystems) Chair(s): Bogdan Vasilescu
Single statement bugs are one of the most important ingredients in the evaluation of modern bug detection and automatic program repair methods. By affecting only a single statement, single statement bugs represent a type of bug often overlooked by developers, while still being small enough to be detected and fixed by automatic methods.
With the rise of data-driven automatic repair the availability of single statement bugs at the scale of millionth of examples is more important than ever; not only for testing these methods but also for providing sufficient real world examples for training. To provide access to bug fix datasets of this scale, we are releasing two datasets called SSB-9M and TSSB-3M.
While SSB-9M provides access to a collection of over 9M general single statement bug fixes from over 500K open source Python projects , TSSB-3M focuses on over 3M single statement bugs which can be fixed solely by a single statement change. To facilitate future research and empirical investigations, we annotated each bug fix with one of 20 single statement bug (SStuB) patterns typical for Python together with a characterization of the code change as a sequence of AST modifications. Our initial investigation shows that at least 40% of all single statement bug fixes mined fit at least one SStuB pattern, and that the majority of 72% of all bugs can be fixed with the same syntactic modifications as needed for fixing SStuBs.
Thu 19 MayDisplayed time zone: Eastern Time (US & Canada) change
04:00 - 04:50 | Session 9: Scaling & CloudIndustry Track / Registered Reports / Data and Tool Showcase Track / Technical Papers at MSR Main room - even hours Chair(s): Lwin Khin Shar Singapore Management University | ||
04:00 4mTalk | SniP: An Efficient Stack Tracing Framework for Multi-threaded Programs Data and Tool Showcase Track Arun KP Indian Institute of Technology Kanpur, Saurabh Kumar Indian Institute of Technology Kanpur, Debadatta Mishra , Biswabandan Panda Indian Institute of Technology Bombay DOI Pre-print | ||
04:04 4mTalk | Tooling for Time- and Space-efficient git Repository Mining Data and Tool Showcase Track Fabian Heseding Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Willy Scheibel Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Jürgen Döllner Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam | ||
04:08 4mTalk | TSSB-3M: Mining single statement bugs at massive scale Data and Tool Showcase Track Cedric Richter Carl von Ossietzky Universität Oldenburg / University of Oldenburg, Heike Wehrheim Carl von Ossietzky Universität Oldenburg / University of Oldenburg Pre-print Media Attached | ||
04:12 7mTalk | Improved Business Outcomes from Cloud Applications – using Integrated Process and Runtime Product Data Mining Industry Track | ||
04:19 7mTalk | Improve Quality of Cloud Serverless Architectures through Software Repository Mining Industry Track | ||
04:26 4mTalk | Toward Granular Automatic Unit Test Case Generation Registered Reports Fabiano Pecorelli Tampere University, Giovanni Grano LocalStack, Fabio Palomba University of Salerno, Harald C. Gall University of Zurich, Andrea De Lucia University of Salerno Pre-print | ||
04:30 20mLive Q&A | Discussions and Q&A Technical Papers |