ReCover: a Curated Dataset for Regression Testing Research
It is recognized in the literature that finding representative data to conduct regression testing research is non-trivial. In our experience within this field, existing datasets are often affected by issues that limit their applicability. Indeed, these datasets often lack fine-grained coverage information, reference software repositories that are not available anymore, or do not allow researchers to readily build and run the software projects, e.g., to obtain additional information. As a step towards better replicability and data-availability in regression testing research, we introduce ReCover, a dataset of 114 pairs of subsequent versions from 28 open source Java projects from GitHub. In particular, ReCover is intended as a consolidation and enrichment of recent dedicated regression testing datasets proposed in the literature, to overcome some of the above described issues, and to make them ready to use with a broader number of regression testing techniques. To this end, we developed a custom mining tool, that we make available as well, to automatically process two recent, massive regression testing datasets, retaining pairs of software versions for which we were able to (1) retrieve the full source code; (2) build the software in a general-purpose Java/Maven environment (which we provide as a Docker container for ease of replication); and (3) compute fine-grained test coverage metrics. ReCover can be readily employed in regression testing studies, as it bundles in a single package full, buildable source code and detailed coverage reports for all the projects. We envision that its use could foster regression testing research, improving replicability and long-term data availability.
Wed 18 MayDisplayed time zone: Eastern Time (US & Canada) change
13:00 - 13:50 | Session 4: Software Quality (Bugs & Smells)Data and Tool Showcase Track / Technical Papers at MSR Main room - odd hours Chair(s): Maxime Lamothe Polytechnique Montreal, Montreal, Canada, Mahmoud Alfadel University of Waterloo | ||
13:00 7mTalk | Dazzle: Using Optimized Generative Adversarial Networks to Address Security Data Class Imbalance Issue Technical Papers Rui Shu North Carolina State University, Tianpei Xia North Carolina State University, Laurie Williams North Carolina State University, Tim Menzies North Carolina State University | ||
13:07 7mTalk | To What Extent do Deep Learning-based Code Recommenders Generate Predictions by Cloning Code from the Training Set? Technical Papers Matteo Ciniselli Università della Svizzera Italiana, Luca Pascarella Università della Svizzera italiana (USI), Gabriele Bavota Software Institute, USI Università della Svizzera italiana Pre-print | ||
13:14 7mTalk | How to Improve Deep Learning for Software Analytics (a case study with code smell detection) Technical Papers Pre-print | ||
13:21 7mTalk | Using Active Learning to Find High-Fidelity Builds Technical Papers Harshitha Menon Lawrence Livermore National Lab, Konstantinos Parasyris Lawrence Livermore National Laboratory, Todd Gamblin Lawrence Livermore National Laboratory, Tom Scogland Lawrence Livermore National Laboratory Pre-print | ||
13:28 4mTalk | ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction Data and Tool Showcase Track Hossein Keshavarz David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada, Mei Nagappan University of Waterloo Pre-print | ||
13:32 4mTalk | ReCover: a Curated Dataset for Regression Testing Research Data and Tool Showcase Track Francesco Altiero Università degli Studi di Napoli Federico II, Anna Corazza Università degli Studi di Napoli Federico II, Sergio Di Martino Università degli Studi di Napoli Federico II, Adriano Peron Università degli Studi di Napoli Federico II, Luigi Libero Lucio Starace Università degli Studi di Napoli Federico II | ||
13:36 14mLive Q&A | Discussions and Q&A Technical Papers |