The field of Automated Program Repair (APR) has received increasing attention in recent years both from the academic world and from leading IT companies. It’s main goal is to repair software bugs automatically, thus reducing the cost of development and maintenance significantly. Recent works use state-of-the-art deep learning models to predict correct patches, for these teaching on a large amount of data is inevitable almost in every scenarios. Despite this, readily accessible data on the field is very scarce. To contribute to related research, we present FixJS, a dataset containing bug-fixing information of ~ 2 million commits. The commits were gathered from GitHub and processed locally to have both the buggy (before bug fixing commit) and fixed (after fix) version of the same program. We focused on JavaScript functions, as it is one of the most popular programming language globally and functions are first class objects there. The data includes more than 300.000 samples of such functions, including commit information, before/after states and 3 source code representations.
Viktor Csuvik Department of Software Engineering, MTA-SZTE Research Group on Artificial Intelligence, University of Szeged, Szeged, Hungary, László Vidács University of Szeged, Hungary