Mon 15 MayDisplayed time zone: Hobart change
11:00 - 11:45 | SE for MLData and Tool Showcase Track / Technical Papers at Meeting Room 110 Chair(s): Sarah Nadi University of Alberta | ||
11:24 6mTalk | DeepScenario: An Open Driving Scenario Dataset for Autonomous Driving System Testing Data and Tool Showcase Track Chengjie Lu Simula Research Laboratory and University of Oslo, Tao Yue Simula Research Laboratory, Shaukat Ali Simula Research Laboratory Pre-print | ||
11:30 6mTalk | NICHE: A Curated Dataset of Engineered Machine Learning Projects in Python Data and Tool Showcase Track Ratnadira Widyasari Singapore Management University, Singapore, Zhou Yang Singapore Management University, Ferdian Thung Singapore Management University, Sheng Qin Sim Singapore Management University, Singapore, Fiona Wee Singapore Management University, Singapore, Camellia Lok Singapore Management University, Singapore, Jack Phan Singapore Management University, Singapore, Haodi Qi Singapore Management University, Singapore, Constance Tan Singapore Management University, Singapore, Qijin Tay Singapore Management University, Singapore, David Lo Singapore Management University | ||
11:36 6mTalk | PTMTorrent: A Dataset for Mining Open-source Pre-trained Model Packages Data and Tool Showcase Track Wenxin Jiang Purdue University, Nicholas Synovic Loyola University Chicago, Purvish Jajal Purdue University, Taylor R. Schorlemmer Purdue University, Arav Tewari Purdue University, Bhavesh Pareek Purdue University, George K. Thiruvathukal Loyola University Chicago and Argonne National Laboratory, James C. Davis Purdue University Pre-print |
11:50 - 12:35 | Documentation + Q&A IData and Tool Showcase Track / Technical Papers at Meeting Room 109 Chair(s): Ahmad Abdellatif Concordia University | ||
12:26 6mTalk | GIRT-Data: Sampling GitHub Issue Report Templates Data and Tool Showcase Track Nafiseh Nikehgbal Sharif University of Technology, Amir Hossein Kargaran LMU Munich, Abbas Heydarnoori Bowling Green State University, Hinrich Schütze LMU Munich Pre-print |
11:50 - 12:35 | Software Evolution & AnalysisData and Tool Showcase Track / Technical Papers at Meeting Room 110 Chair(s): Michael Schlichtig | ||
11:56 6mTalk | DGMF: Fast Generation of Comparable, Updatable Dependency Graphs for Software Repositories Data and Tool Showcase Track Tobias Litzenberger TU Dortmund University, Johannes Düsing TU Dortmund University, Ben Hermann TU Dortmund | ||
12:02 6mTalk | Enabling Analysis and Reasoning on Software Systems through Knowledge Graph Representation Data and Tool Showcase Track | ||
12:08 6mTalk | microSecEnD: A Dataset of Security-Enriched Dataflow Diagrams for Microservice Applications Data and Tool Showcase Track Simon Schneider Hamburg University of Technology, Tufan Özen Hamburg University of Technology, Michael Chen Hamburg University of Technology, Riccardo Scandariato Hamburg University of Technology |
14:20 - 15:15 | Understanding DefectsRegistered Reports / Data and Tool Showcase Track / Technical Papers at Meeting Room 110 Chair(s): Matteo Paltenghi University of Stuttgart, Germany | ||
14:44 6mTalk | Semantically-enriched Jira Issue Tracking Data Data and Tool Showcase Track Themistoklis Diamantopoulos Electrical and Computer Engineering Dept, Aristotle University of Thessaloniki, Dimitrios-Nikitas Nastos Electrical and Computer Engineering Dept., Aristotle University of Thessaloniki, Andreas Symeonidis Electrical and Computer Engineering Dept., Aristotle University of Thessaloniki Pre-print | ||
14:56 6mTalk | HasBugs - Handpicked Haskell Bugs Data and Tool Showcase Track |
15:45 - 16:30 | Process Automation & DevOpsData and Tool Showcase Track / Technical Papers / Industry Track at Meeting Room 110 Chair(s): Andy Meneely Rochester Institute of Technology | ||
16:09 6mTalk | EGAD: A Moldable Tool for GitHub Action Analysis Data and Tool Showcase Track Pablo Valenzuela-Toledo University of Bern, Alexandre Bergel University of Chile, Timo Kehrer University of Bern, Oscar Nierstrasz University of Bern, Switzerland DOI Pre-print |
16:35 - 17:20 | SecurityTechnical Papers / Data and Tool Showcase Track at Meeting Room 110 Chair(s): Chanchal K. Roy University of Saskatchewan | ||
17:11 6mTalk | SecretBench: A Dataset of Software Secrets Data and Tool Showcase Track Setu Kumar Basak North Carolina State University, Lorenzo Neil North Carolina State University, Bradley Reaves North Carolina State University, Laurie Williams North Carolina State University Pre-print |
Tue 16 MayDisplayed time zone: Hobart change
11:00 - 11:45 | Documentation + Q&A IITechnical Papers / Data and Tool Showcase Track at Meeting Room 109 Chair(s): Maram Assi Queen's University | ||
11:24 6mTalk | GiveMeLabeledIssues: An Open Source Issue Recommendation System Data and Tool Showcase Track Joseph Vargovich Northern Arizona University, Fabio Marcos De Abreu Santos Northern Arizona University, USA, Jacob Penney Northern Arizona University, Marco Gerosa Northern Arizona University, Igor Steinmacher Northern Arizona University Pre-print Media Attached | ||
11:30 6mTalk | DocMine: A Software Documentation-Related Dataset of 950 GitHub Repositories Data and Tool Showcase Track | ||
11:36 6mTalk | PENTACET data - 23 Million Code Comments and 500,000 SATD comments Data and Tool Showcase Track Murali Sridharan University of Oulu, Leevi Rantala University of Oulu, Mika Mäntylä University of Oulu |
11:00 - 11:45 | Code SmellsTechnical Papers / Industry Track / Data and Tool Showcase Track at Meeting Room 110 Chair(s): Md Tajmilur Rahman Gannon University | ||
11:24 6mTalk | CLEAN++: Code Smells Extraction for C++ Data and Tool Showcase Track Tom Mashiach Ben Gurion University of the Negev, Israel, Bruno Sotto-Mayor Ben Gurion University of the Negev, Israel, Gal Kaminka Bar Ilan University, Israel, Meir Kalech Ben Gurion University of the Negev, Israel | ||
11:30 6mTalk | DACOS-A Manually Annotated Dataset of Code Smells Data and Tool Showcase Track Himesh Nandani Dalhousie University, Mootez Saad Dalhousie University, Tushar Sharma Dalhousie University Pre-print File Attached |
11:50 - 12:35 | Development Tools & Practices IIData and Tool Showcase Track / Industry Track / Technical Papers / Registered Reports at Meeting Room 109 Chair(s): Banani Roy University of Saskatchewan | ||
12:02 6mTalk | A Dataset of Bot and Human Activities in GitHub Data and Tool Showcase Track Natarajan Chidambaram University of Mons, Alexandre Decan University of Mons; F.R.S.-FNRS, Tom Mens University of Mons |
11:50 - 12:35 | Software Libraries & EcosystemsTechnical Papers / Industry Track / Data and Tool Showcase Track at Meeting Room 110 Chair(s): Mehdi Keshani Delft University of Technology | ||
12:14 6mTalk | PyMigBench: A Benchmark for Python Library Migration Data and Tool Showcase Track Mohayeminul Islam University of Alberta, Ajay Jha North Dakota State University, Sarah Nadi University of Alberta, Ildar Akhmetov University of Alberta Pre-print |
13:45 - 14:30 | Software QualityData and Tool Showcase Track / Technical Papers at Meeting Room 110 Chair(s): Tushar Sharma Dalhousie University | ||
14:21 6mTalk | Snapshot Testing Dataset Data and Tool Showcase Track |
14:35 - 15:15 | Defect PredictionData and Tool Showcase Track / Technical Papers at Meeting Room 109 Chair(s): Sarra Habchi Ubisoft | ||
14:59 6mTalk | LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations Data and Tool Showcase Track Catherine Tony Hamburg University of Technology, Markus Mutas Hamburg University of Technology, Nicolás E. Díaz Ferreyra Hamburg University of Technology, Riccardo Scandariato Hamburg University of Technology Pre-print | ||
15:05 6mTalk | Defectors: A Large, Diverse Python Dataset for Defect Prediction Data and Tool Showcase Track Parvez Mahbub Dalhousie University, Ohiduzzaman Shuvo Dalhousie University, Masud Rahman Dalhousie University Pre-print |
14:35 - 15:15 | Human AspectsTechnical Papers / Data and Tool Showcase Track at Meeting Room 110 Chair(s): Alexander Serebrenik Eindhoven University of Technology | ||
15:05 6mTalk | GitHub OSS Governance File Dataset Data and Tool Showcase Track Yibo Yan University of California, Davis, Seth Frey University of California, Davis, Amy Zhang University of Washington, Seattle, Vladimir Filkov University of California at Davis, USA, Likang Yin University of California at Davis Pre-print |
Accepted Papers
Call for Papers
The MSR Data and Tools Showcase Track aims to actively promote and recognize the creation of reusable datasets and tools that are designed and built not only for a specific research project, but for the MSR community as a whole. These datasets and tools should enable other practitioners and researchers to jumpstart their research efforts, and also allows the reproducibility of earlier work. The MSR Data and Tools Showcase papers can be descriptions of datasets or tools built by the authors that can be used by other practitioners or researchers, and/or descriptions of the use of tools built by others to obtain specific research results.
MSR’23 Data and Tools Showcase Track will accept two types of submissions: (1) data showcase papers and (2) reusable tool showcase papers.
-
Data showcase submissions are expected to include:
- a description of the data source,
- a description of the methodology used to gather the data (including provenance and the tool used to create/generate/gather the data, if any),
- a description of the storage mechanism, including a schema if applicable,
- if the data has been used by the authors or others, a description of how this was done including references to previously published papers,
- a description of the originality of the dataset (that is, even if the dataset has been used in a published paper, its complete description must be unpublished) and similar existing datasets (if any),
- ideas for future research questions that could be answered using the dataset,
- ideas for further improvements that could be made to the dataset, and
- any limitations and/or challenges in creating or using the dataset.
-
Reusable Tool showcase submissions are expected to include:
- a description of the tool, which includes the background, motivation, novelty, overall architecture, detailed design, and preliminary evaluation of the tool, as well as the link to download or access the tool,
- a description of the design of the tool, and how to use the tool in practice,
- clear installation instructions and example dataset that allow the reviewers to run the tool,
- if the tool has been used by the authors or others, a description of how the tool was used, including references to previously published papers,
- ideas for future reusability of the tool, and
- any limitations of using the tool.
The dataset or tool should be made available at the time of submission of the paper for review but will be considered confidential until publication of the paper. The dataset or tool should include detailed instructions about how to set up the environment (e.g., requirements.txt), how to use the dataset or tool (e.g., how to import the data or how to access the data once it has been imported, how to use the tool with a running example).
At a minimum, upon publication of the paper, the authors should archive the data or tool on a persistent repository that can provide a digital object identifier (DOI) such as zenodo.org, figshare.com, Archive.org, or institutional repositories. In addition, the DOI-based citation of the dataset or the tool should be included in the camera-ready version of the paper. GitHub provides an easy way to make source code citable (with third tools and with a CITATION file).
Data and Tools showcase submissions are not: * empirical studies, or * datasets that are based on poorly explained or untrustworthy heuristics for data collection, or results of trivial application of generic tools.
If custom tools have been used to create the dataset, we expect the paper to be accompanied by the source code of the tools, along with clear documentation on how to run the tools to recreate the dataset. The tools should be open source, accompanied by an appropriate license; the source code should be citable, i.e., refer to a specific release and have a DOI. If you cannot provide the source code or the source code clause is not applicable (e.g., because the dataset consists of qualitative data), please provide a short explanation of why this is not possible.
Evaluation Criteria
The Review Criteria for the Data/Tool Showcase submissions are as follows:
- value, usefulness, and reusability of the datasets or tools.
- quality of the presentation.
- clarity of relation with related work and its relevance to mining software repositories.
- availability of the datasets or tools.
Important Dates
- Paper Deadline: Thursday 26th January 2023
- Author Notification: Tuesday 7th March 2023
- Camera Ready Deadline: Thursday 16th March 2023
Submission
Submit your paper (maximum 4 pages, plus 1 additional page of references) via the HotCRP submission site: https://msr2023-data-tool.hotcrp.com/.
Submitted papers will undergo single-anonymous peer review. We opt for single-anonymous peer review (as opposed to the double-anonymous peer review of the main track) due to the requirement above to describe the ways how data has been used in the previous studies, including the bibliographic reference to those studies. Such a reference is likely to disclose the authors’ identity.
To make research datasets and research software accessible and citable, we further encourage authors to attend to the FAIR rules, i.e., data should be: Findable, Accessible, Interoperable, and Reusable.
Submissions must conform to the IEEE formatting instructions IEEE Conference Proceedings Formatting Guidelines (title in 24pt font and full text in 10pt type, LaTeX users must use \documentclass[10pt,conference]{IEEEtran}
without including the compsoc
or compsocconf
options).
Papers submitted for consideration should not have been published elsewhere and should not be under review or submitted for review elsewhere for the duration of consideration. ACM plagiarism policies and procedures shall be followed for cases of double submission. The submission must also comply with the IEEE Policy on Authorship. Please read the ACM Policy on Plagiarism, Misrepresentation, and Falsification and the IEEE - Introduction to the Guidelines for Handling Plagiarism Complaints before submitting.
Upon notification of acceptance, all authors of accepted papers will be asked to complete a copyright form and will receive further instructions for preparing their camera-ready versions. At least one author of each paper is expected to register and present the results at the MSR 2023 conference. All accepted contributions will be published in the conference electronic proceedings.