Call for Papers
The MSR Data/Tool Showcase track aims to actively promote and recognize the creation of reusable datasets and tools that are designed and built not only for a specific research project, but for the MSR community as a whole. These datasets and tools should enable other practitioners and researchers to jumpstart their own research efforts, and also enable the reproducibility of earlier work. The MSR Data/Tool Showcase papers can be descriptions of datasets or tools built by the authors that can be used by other practitioners or researchers, and/or descriptions of the use of tools built by others to obtain specific research results.
Types of MSR’22 Data and Tool Showcase Track Submission
MSR’22 Data/Tool Showcase Track will accept two types of submissions: (1) data showcase papers and (2) reusable tool showcase papers.
The authors should prepare submissions with a maximum of 4 pages, plus 1 additional page of references. Submissions should be submitted to the HotCRP submission site on or before Thursday 27th January 2022.
The Review Criteria for the Data/Tool Showcase submissions are as follows:
- The value, usefulness, and reusability of the datasets or tools.
- The quality of the presentation.
- The clarity of relation with related work and its relevance to mining software repositories.
- The availability of the datasets or tools.
1. Data Showcase
MSR Data showcase submissions are expected to include:
- A description of the data source,
- A description of the methodology used to gather the data (including provenance and the tool used to create/generate/gather the data, if any),
- A description of the storage mechanism, including a schema if applicable,
- If the data has been used by the authors or others, a description of how this was done including references to previously published papers,
- A description of the originality of the data set (that is, even if the data set has been used in a published paper, its complete description must be unpublished) and similar existing datasets (if any)
- A description of the design of the tool, and how to use the tool in practice ideas for future research questions that could be answered using the data set,
- Ideas for further improvements that could be made to the data set, and
- Any limitations and/or challenges in creating or using the data set.
2. Reusable Tool Showcase
MSR Reusable Tool showcase submissions are expected to include:
- A description of the tool, which includes the background, motivation, novelty, overall architecture, detailed design, and preliminary evaluation of the tool, as well as the link to download or access the tool.
- A description of the design of the tool, how to use the tool in practice.
- Clear installation instructions and example data set that allow the reviewers to run the tool.
- If the tool has been used by the authors or others, a description of how the tool was used including references to previously published papers Ideas for future reusability of the tools
- Any limitations of using the tools
The dataset/tool should be made available at the time of submission of the paper for review but will be considered confidential until publication of the paper. The dataset/tool should include detailed instructions about how to set up the environment (e.g., requirements.txt), how to use the datasets/tools (e.g., how to import the data or how to access the data once it has been imported, how to use the tool with a running example).
At a minimum, upon publication of the paper, the authors should archive the data or tool on a persistent repository that can provide a digital object identifier (DOI) such as zenodo.org, figshare.com, Archive.org, or institutional repositories. In addition, the DOI-based citation of the dataset or the tool should be included in the camera-ready version of the paper.
Data/Tool showcase submissions are not:
- Empirical studies.
- Datasets that are based on poorly explained or untrustworthy heuristics for data collection, or results of trivial application of generic tools.
If custom tools have been used to create the data set, we expect the paper to be accompanied by the source code of the tools, along with clear documentation on how to run the tools to recreate the data set. The tools should be open source, accompanied by an appropriate license; the source code should be citable, i.e., refer to a specific release and have a DOI. GitHub provides an easy way to make source code citable. If you cannot provide the source code or the source code clause is not applicable (e.g., because the data set consists of qualitative data), please provide a short explanation of why this is not possible.
Important Dates
- Abstract Deadline: Tuesday 25th January 2022
- Paper Deadline: Thursday 27th January 2022
- Author Notification: March 8
- Camera Ready Deadline: Late March
Submission
Please submit your data and tool paper(s) (maximum 4 pages, plus 1 additional page of references) via the HotCRP submission site on or before Thursday 27th January 2022.
Submitted papers will undergo single-blind peer review. We opt for single-blind peer review (as opposed to the double-blind peer review of the main track) due to the requirement above to describe the ways how data has been used in the previous studies, including the bibliographic reference to those studies. Such a reference is likely to disclose the authors’ identity.
To make research datasets and tools accessible and citable, we further encourage authors to attend to the FAIR rules, i.e., datasets and tools should be: Findable, Accessible, Interoperable, and Reusable.
All authors should use the official “ACM Primary Article Template”, as can be obtained from the ACM Proceedings Template page. LaTeX users should use the sigconf
option, as well as the review (to produce line numbers for easy reference by the reviewers). To that end, the following LaTeX code can be placed at the start of the LaTeX document:
\documentclass[sigconf,review]{acmart}
\acmConference[MSR 2022]{MSR '22: Proceedings of the 19th International Conference on Mining Software Repositories}{May 23–24, 2022}{Pittsburgh, PA, USA}
We encourage authors to upload their paper info early (the PDF can be submitted later). All submissions must adhere to the following requirements:
- Submissions must not exceed the page limit (4 pages plus 1 additional page of references for short papers). The page limit is strict, and it will not be possible to purchase additional pages at any point in the process (including after acceptance).
- Submissions must strictly conform to the ACM formatting instructions. Alterations of spacing, font size, and other changes that deviate from the instructions may result in desk rejection without further review.
Any submission that does not comply with these requirements is likely to be desk rejected by the PC Chairs without further review. In addition, by submitting to the MSR Technical Track, the authors acknowledge that they are aware of and agree to be bound by the following policies:
- The ACM Policy and Procedures on Plagiarism and the IEEE Plagiarism FAQ. In particular, papers submitted to MSR 2022 must not have been published elsewhere and must not be under review or submitted for review elsewhere whilst under consideration for MSR 2022. Contravention of this concurrent submission policy will be deemed a serious breach of scientific ethics, and appropriate action will be taken in all such cases (including immediate rejection and reporting of the incident to ACM/IEEE). To check for double submission and plagiarism issues, the chairs reserve the right to (1) share the list of submissions with the PC Chairs of other conferences with overlapping review periods and (2) use external plagiarism detection software, under contract to the ACM or IEEE, to detect violations of these policies.
- The authorship policy of the ACM and the authorship policy of the IEEE.
Upon notification of acceptance, all authors of accepted papers will be asked to fill a copyright form and will receive further instructions for preparing the camera-ready version of their papers. At least one author of each paper is expected to register and present the paper at the MSR 2022 conference. All accepted contributions will be published in the electronic proceedings of the conference.
For enquiries, please contact the MSR Data/Tool Co-Chairs at chakkrit@monash.edu and xin.xia@acm.org
Wed 18 MayDisplayed time zone: Eastern Time (US & Canada) change
03:00 - 03:50 | Session 2: Maintenance (Issues & Smells) Technical Papers / Registered Reports / Data and Tool Showcase Track / Industry Track at MSR Main room - odd hours Chair(s): Alessio Ferrari CNR-ISTI | ||
03:00 4mTalk | An Alternative Issue Tracking Dataset of Public Jira Repositories Data and Tool Showcase Track Lloyd Montgomery Universität Hamburg, Clara Marie Lüders University of Hamburg, Walid Maalej University of Hamburg Pre-print Media Attached |
05:00 - 05:50 | Session 3: Introspection, Vision, and Human Aspects Technical Papers / Data and Tool Showcase Track / Industry Track / Registered Reports at MSR Main room - odd hours Chair(s): Alexander Serebrenik Eindhoven University of Technology, Sebastian Baltes SAP SE & University of Adelaide | ||
05:11 4mTalk | The General Index of Software Engineering Papers Data and Tool Showcase Track DOI Pre-print |
13:00 - 13:50 | Session 4: Software Quality (Bugs & Smells)Data and Tool Showcase Track / Technical Papers at MSR Main room - odd hours Chair(s): Maxime Lamothe Polytechnique Montreal, Montreal, Canada, Mahmoud Alfadel University of Waterloo | ||
13:28 4mTalk | ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction Data and Tool Showcase Track Hossein Keshavarz David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada, Mei Nagappan University of Waterloo Pre-print | ||
13:32 4mTalk | ReCover: a Curated Dataset for Regression Testing Research Data and Tool Showcase Track Francesco Altiero Università degli Studi di Napoli Federico II, Anna Corazza Università degli Studi di Napoli Federico II, Sergio Di Martino Università degli Studi di Napoli Federico II, Adriano Peron Università degli Studi di Napoli Federico II, Luigi Libero Lucio Starace Università degli Studi di Napoli Federico II |
14:00 - 14:50 | Session 5: Communication & Domains Data and Tool Showcase Track / Technical Papers at MSR Main room - even hours Chair(s): Masud Rahman Dalhousie University, Mahmoud Alfadel University of Waterloo | ||
14:14 4mTalk | SoCCMiner: A Source Code-Comments and Comment-Context Miner Data and Tool Showcase Track Murali Sridharan University of Oulu, Mika Mäntylä University of Oulu, Maëlick Claes University of Oulu, Leevi Rantala University of Oulu Pre-print | ||
14:18 4mTalk | SLNET: A Redistributable Corpus of 3rd-party Simulink Models Data and Tool Showcase Track Sohil Lal Shrestha The University of Texas at Arlington, Shafiul Azam Chowdhury University of Texas at Arlington, Christoph Csallner University of Texas at Arlington DOI Pre-print Media Attached | ||
14:22 4mTalk | SOSum: A Dataset of Stack Overflow Post Summaries Data and Tool Showcase Track Bonan Kou Purdue University, Yifeng Di Purdue University, Muhao Chen University of Southern California, Tianyi Zhang Purdue University | ||
14:26 4mTalk | Inspect4py: A Knowledge Extraction Framework for Python Code Repositories Data and Tool Showcase Track | ||
14:30 4mTalk | DISCO: A Dataset of Discord Chat Conversations for Software Engineering Research Data and Tool Showcase Track Keerthana Muthu Subash Carleton University, Canada, Lakshmi Prasanna Kumar Carleton University, Canada, Sri Lakshmi Vadlamani Carleton University, Canada, Preetha Chatterjee Drexel University, USA, Olga Baysal Carleton University DOI Pre-print Media Attached |
20:00 - 20:50 | Session 6: Maintenance & TestingData and Tool Showcase Track / Technical Papers at MSR Main room - even hours Chair(s): Ajay Jha University of Alberta, Amjed Tahir Massey University | ||
20:25 4mTalk | Methods2Test: A dataset of focal methods mapped to test cases Data and Tool Showcase Track Michele Tufano Microsoft, Shao Kun Deng Microsoft Corporation, Neel Sundaresan Microsoft Corporation, Alexey Svyatkovskiy | ||
20:29 4mTalk | npm-filter: Automating the mining of dynamic information from npm packages Data and Tool Showcase Track Pre-print Media Attached | ||
20:33 4mTalk | ManyTypes4TypeScript: A Comprehensive TypeScript Dataset for Sequence-Based Type Inference Data and Tool Showcase Track Kevin Jesse University of California, Davis, Prem Devanbu Department of Computer Science, University of California, Davis DOI Pre-print |
21:00 - 21:50 | Session 7: Developer Wellbeing & Project CommunicationTechnical Papers / Data and Tool Showcase Track / Industry Track at MSR Main room - odd hours Chair(s): Bram Adams Queen's University, Kingston, Ontario | ||
21:14 4mTalk | The OCEAN mailing list data set: Network analysis spanning mailing lists and code repositories Data and Tool Showcase Track Melanie Warrick University of Vermont, Samuel F. Rosenblatt University of Vermont, Jean-Gabriel Young University of Vermont, amanda casari Open Source Programs Office, Google, Laurent Hébert-Dufresne University of Vermont, James P. Bagrow University of Vermont DOI Pre-print Media Attached | ||
21:18 4mTalk | The Unexplored Treasure Trove of Phabricator Code Reviews Data and Tool Showcase Track Gunnar Kudrjavets University of Groningen, Nachiappan Nagappan Microsoft Research, Ayushi Rastogi University of Groningen, The Netherlands DOI Pre-print | ||
21:22 4mTalk | The Unsolvable Problem or the Unheard Answer? A Dataset of 24,669 Open-Source Software Conference Talks Data and Tool Showcase Track Kimberly Truong Oregon State University, Courtney Miller Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University, USA, Christian Kästner Carnegie Mellon University DOI Pre-print | ||
21:26 4mTalk | Exploring Apache Incubator Project Trajectories with APEX Data and Tool Showcase Track Anirudh Ramchandran University of California, Davis, Likang Yin University of California, Davis, Vladimir Filkov University of California at Davis |
Thu 19 MayDisplayed time zone: Eastern Time (US & Canada) change
04:00 - 04:50 | Session 9: Scaling & CloudIndustry Track / Registered Reports / Data and Tool Showcase Track / Technical Papers at MSR Main room - even hours Chair(s): Lwin Khin Shar Singapore Management University | ||
04:00 4mTalk | SniP: An Efficient Stack Tracing Framework for Multi-threaded Programs Data and Tool Showcase Track Arun KP Indian Institute of Technology Kanpur, Saurabh Kumar Indian Institute of Technology Kanpur, Debadatta Mishra , Biswabandan Panda Indian Institute of Technology Bombay DOI Pre-print | ||
04:04 4mTalk | Tooling for Time- and Space-efficient git Repository Mining Data and Tool Showcase Track Fabian Heseding Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Willy Scheibel Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Jürgen Döllner Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam | ||
04:08 4mTalk | TSSB-3M: Mining single statement bugs at massive scale Data and Tool Showcase Track Cedric Richter Carl von Ossietzky Universität Oldenburg / University of Oldenburg, Heike Wehrheim Carl von Ossietzky Universität Oldenburg / University of Oldenburg Pre-print Media Attached |
20:00 - 20:50 | Session 12: Integration & Large-Scale MiningTechnical Papers / Data and Tool Showcase Track at MSR Main room - even hours Chair(s): Jin L.C. Guo McGill University, Amjed Tahir Massey University | ||
20:32 4mTalk | TwinDroid: A Dataset of Android app System call traces and Trace Generation Pipeline Data and Tool Showcase Track Asma Razgallah Université du Québec à Chicoutimi, Canada, Raphael Khoury Université du Québec à Chicoutimi, Canada, Jean-Baptiste Poulet Université du Québec à Chicoutimi, Canada |
21:00 - 21:50 | Session 13: Security & QualityTechnical Papers / Data and Tool Showcase Track / Registered Reports / Industry Track at MSR Main room - odd hours Chair(s): Gias Uddin University of Calgary, Canada | ||
21:21 4mTalk | ECench: An Energy Bug Benchmark of Ethereum Client Software Data and Tool Showcase Track Jinyoung Kim Sungkyunkwan University, Misoo Kim Sungkyunkwan University, Eunseok Lee Sungkyunkwan University |
Fri 20 MayDisplayed time zone: Eastern Time (US & Canada) change
04:00 - 04:50 | Session 14: Software Quality Technical Papers / Industry Track / Data and Tool Showcase Track at MSR Main room - even hours Chair(s): Kla Tantithamthavorn Monash University, Simone Scalabrino University of Molise | ||
04:25 4mTalk | Constructing Dataset of Functionally Equivalent Java Methods Using Automated Test Generation Techniques Data and Tool Showcase Track Yoshiki Higo Osaka University, Shinsuke Matsumoto Osaka University, Shinji Kusumoto Osaka University, Kazuya Yasuda Hitachi, Ltd. Media Attached |
11:00 - 11:50 | Session 15: Collaboration & Open SourceRegistered Reports / Data and Tool Showcase Track / Technical Papers / Industry Track at MSR Main room - odd hours Chair(s): Massimiliano Di Penta University of Sannio, Italy, Fiorella Zampetti University of Sannio, Italy | ||
11:07 4mTalk | FixJS: A Dataset of Bug-fixing JavaScript Commits Data and Tool Showcase Track Viktor Csuvik Department of Software Engineering, MTA-SZTE Research Group on Artificial Intelligence, University of Szeged, Szeged, Hungary, László Vidács University of Szeged, Hungary File Attached | ||
11:11 4mTalk | A Time Series-Based Dataset of Open-Source Software Evolution Data and Tool Showcase Track Bruno L. Sousa UFMG, Mariza Bigonha Professor at Federal University of Minas Gerais, Kecia A. M. Ferreira CEFET-MG, Glaura C. Franco UFMG DOI Pre-print Media Attached | ||
11:15 4mTalk | LAGOON: An Analysis Tool for Open Source Communities Data and Tool Showcase Track Pre-print Media Attached | ||
11:19 4mTalk | A Versatile Dataset of Agile Open Source Software Projects Data and Tool Showcase Track Vali Tawosi University College London, Afnan Al-Subaihin University College London, Rebecca Moussa University College London, Federica Sarro University College London Link to publication DOI Pre-print Media Attached |
14:00 - 15:00 | Session 16: Non-functional Properties (Availability, Security, Legal Aspects)Industry Track / Technical Papers / Registered Reports / Data and Tool Showcase Track at MSR Main room - even hours Chair(s): Maxime Lamothe Polytechnique Montreal, Montreal, Canada, Jin L.C. Guo McGill University | ||
14:07 4mTalk | A Large-scale Dataset of (Open Source) License Text VariantsData and Tool Showcase Award Data and Tool Showcase Track Stefano Zacchiroli Télécom Paris, Polytechnic Institute of Paris DOI Pre-print |
Mon 23 MayDisplayed time zone: Eastern Time (US & Canada) change
11:00 - 12:30 | Blended Technical Session 1 (Integration, Large-scale mining, and Software Ecosystems)Technical Papers / Data and Tool Showcase Track at Room 315+316 Chair(s): Bogdan Vasilescu Carnegie Mellon University, USA | ||
11:30 8mTalk | Dataset: Dependency Networks of Open Source Libraries Available Through CocoaPods, Carthage and Swift PM Data and Tool Showcase Track Pre-print Media Attached | ||
11:38 8mTalk | A Large-scale Dataset of (Open Source) License Text VariantsData and Tool Showcase Award Data and Tool Showcase Track Stefano Zacchiroli Télécom Paris, Polytechnic Institute of Paris DOI Pre-print | ||
11:46 8mTalk | TSSB-3M: Mining single statement bugs at massive scale Data and Tool Showcase Track Cedric Richter Carl von Ossietzky Universität Oldenburg / University of Oldenburg, Heike Wehrheim Carl von Ossietzky Universität Oldenburg / University of Oldenburg Pre-print Media Attached | ||
11:54 8mTalk | LAGOON: An Analysis Tool for Open Source Communities Data and Tool Showcase Track Pre-print Media Attached | ||
12:02 8mTalk | The Unexplored Treasure Trove of Phabricator Code Reviews Data and Tool Showcase Track Gunnar Kudrjavets University of Groningen, Nachiappan Nagappan Microsoft Research, Ayushi Rastogi University of Groningen, The Netherlands DOI Pre-print |
13:30 - 15:00 | Blended Technical Session 2 (Machine Learning and Information Retrieval) Technical Papers / Data and Tool Showcase Track at Room 315+316 Chair(s): Preetha Chatterjee Drexel University, USA | ||
14:31 8mTalk | SOSum: A Dataset of Stack Overflow Post Summaries Data and Tool Showcase Track Bonan Kou Purdue University, Yifeng Di Purdue University, Muhao Chen University of Southern California, Tianyi Zhang Purdue University |
Tue 24 MayDisplayed time zone: Eastern Time (US & Canada) change
09:00 - 10:30 | Blended Technical Session 3 (Smells and Maintenance)Technical Papers / Mining Challenge / Registered Reports / Data and Tool Showcase Track at Room 315+316 Chair(s): Andy Zaidman Delft University of Technology | ||
09:45 8mTalk | npm-filter: Automating the mining of dynamic information from npm packages Data and Tool Showcase Track Pre-print Media Attached |
11:00 - 12:15 | Blended Technical Session 4 (Introspection, Vision, and Human Aspects)Technical Papers / Registered Reports / Data and Tool Showcase Track at Room 315+316 Chair(s): Ayushi Rastogi University of Groningen, The Netherlands | ||
11:38 8mTalk | The General Index of Software Engineering Papers Data and Tool Showcase Track DOI Pre-print |
15:30 - 17:00 | Blended Technical Session 5 (Miscellaneous) Technical Papers / Data and Tool Showcase Track / Mining Challenge at Room 315+316 Chair(s): Luís Cruz Deflt University of Technology | ||
16:00 8mTalk | SLNET: A Redistributable Corpus of 3rd-party Simulink Models Data and Tool Showcase Track Sohil Lal Shrestha The University of Texas at Arlington, Shafiul Azam Chowdhury University of Texas at Arlington, Christoph Csallner University of Texas at Arlington DOI Pre-print Media Attached | ||
16:08 8mTalk | SoCCMiner: A Source Code-Comments and Comment-Context Miner Data and Tool Showcase Track Murali Sridharan University of Oulu, Mika Mäntylä University of Oulu, Maëlick Claes University of Oulu, Leevi Rantala University of Oulu Pre-print |