Defectors: A Large, Diverse Python Dataset for Defect Prediction
Defect prediction has been a popular research topic where deep learning has found numerous applications. However, these deep learning-based defect prediction models are often limited by the quality and size of their datasets. In this paper, we present Defectors, a large dataset for just-in-time and line-level defect prediction. Defectors consists of $\approx$ 213K source code files ($\approx$ 93K defective and $\approx$ 120K defect-free) that span across 24 popular Python projects. These projects come from 18 different domains, including machine learning, automation, and internet-of-things. Such a scale and diversity make Defectors a suitable dataset for training deep learning models, especially transformer models that require large and diverse datasets. We also foresee several application areas of our dataset including defect prediction, defect explanation, and automated program repair.
Tue 16 MayDisplayed time zone: Hobart change
14:35 - 15:15 | Defect PredictionData and Tool Showcase Track / Technical Papers at Meeting Room 109 Chair(s): Sarra Habchi Ubisoft | ||
14:35 12mTalk | Large Language Models and Simple, Stupid Bugs Technical Papers Kevin Jesse University of California at Davis, USA, Toufique Ahmed University of California at Davis, Prem Devanbu University of California at Davis, Emily Morgan University of California, Davis Pre-print | ||
14:47 12mTalk | The ABLoTS Approach for Bug Localization: is it replicable and generalizable?Distinguished Paper Award Technical Papers Feifei Niu University of Ottawa, Christoph Mayr-Dorn JOHANNES KEPLER UNIVERSITY LINZ, Wesley Assunção Johannes Kepler University Linz, Austria & Pontifical Catholic University of Rio de Janeiro, Brazil, Liguo Huang Southern Methodist University, Jidong Ge Nanjing University, Bin Luo Nanjing University, Alexander Egyed Johannes Kepler University Linz Pre-print File Attached | ||
14:59 6mTalk | LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations Data and Tool Showcase Track Catherine Tony Hamburg University of Technology, Markus Mutas Hamburg University of Technology, Nicolás E. Díaz Ferreyra Hamburg University of Technology, Riccardo Scandariato Hamburg University of Technology Pre-print | ||
15:05 6mTalk | Defectors: A Large, Diverse Python Dataset for Defect Prediction Data and Tool Showcase Track Parvez Mahbub Dalhousie University, Ohiduzzaman Shuvo Dalhousie University, Masud Rahman Dalhousie University Pre-print |