MSR 2023
Dates to be announced Melbourne, Australia
co-located with ICSE 2023
Tue 16 May 2023 11:36 - 11:42 at Meeting Room 109 - Documentation + Q&A II Chair(s): Maram Assi

Most SATD research utilizes non-probabilistic sampling for data selection, which weakens the empirical findings’ generalization capability. A closer look reveals several SATD research are based on simple (`Easy to find’) code comments without the contextual data (preceding and succeeding source code context). In this work, we address this gap through PENTACET (or 5C) dataset. PENTACET is a large Curated Contextual Code Comments per Contributor and the most extensive SATD data. It is acquired by mining 9,096 Open Source Software Java projects with a total of 435 million LOC and captures bi-directional contextual information of all source code granularities in more than 26 million source code files. The outcome is data set with 23 million code comments, source code context for each comment, and more than 500,000 comments labeled as SATD.

Tue 16 May

Displayed time zone: Hobart change

11:00 - 11:45
Documentation + Q&A IITechnical Papers / Data and Tool Showcase Track at Meeting Room 109
Chair(s): Maram Assi Queen's University
11:00
12m
Talk
Understanding the Role of Images on Stack Overflow
Technical Papers
Dong Wang Kyushu University, Japan, Tao Xiao Nara Institute of Science and Technology, Christoph Treude University of Melbourne, Raula Gaikovina Kula Nara Institute of Science and Technology, Hideaki Hata Shinshu University, Yasutaka Kamei Kyushu University
Pre-print
11:12
12m
Talk
Do Subjectivity and Objectivity Always Agree? A Case Study with Stack Overflow Questions
Technical Papers
Saikat Mondal University of Saskatchewan, Masud Rahman Dalhousie University, Chanchal K. Roy University of Saskatchewan
Pre-print
11:24
6m
Talk
GiveMeLabeledIssues: An Open Source Issue Recommendation System
Data and Tool Showcase Track
Joseph Vargovich Northern Arizona University, Fabio Marcos De Abreu Santos Northern Arizona University, USA, Jacob Penney Northern Arizona University, Marco Gerosa Northern Arizona University, Igor Steinmacher Northern Arizona University
Pre-print Media Attached
11:30
6m
Talk
DocMine: A Software Documentation-Related Dataset of 950 GitHub Repositories
Data and Tool Showcase Track
11:36
6m
Talk
PENTACET data - 23 Million Code Comments and 500,000 SATD comments
Data and Tool Showcase Track
Murali Sridharan University of Oulu, Leevi Rantala University of Oulu, Mika Mäntylä University of Oulu