Characterizing and Predicting Good First Issues (ESEM 2021 - Technical Papers)

Who

Yuekai Huang, Junjie Wang, Song Wang, Zhe Liu, Dandan Wang, Qing Wang

Track

ESEM 2021 Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 15 Oct 2021 14:20 - 14:35 at ESEM ROOM - Mining Software Repositories Chair(s): Fabio Calefato

Abstract

Background. Where to start contributing to a project is a critical challenge for newcomers of open source projects. To support newcomers, GitHub utilizes the Good First Issue (GFI) label, with which project members can manually tag issues in an open source project that are suitable for the newcomers. However, manually labeling GFIs is time- and effort-consuming given the large number of candidate issues. In addition, project members need to have a close understanding of the project to label GFIs accurately.

Aims. This paper aims at providing a thorough understanding of the characteristics of GFIs and an automatic approach in GFIs prediction, to reduce the burden of project members and help newcomers easily onboard.

Method. We first define 79 features to characterize the GFIs and further analyze the correlation between each feature and GFIs. We then build machine learning models to predict GFIs with the proposed features.

Results. Experiments are conducted with 74,780 issues from 10 open source projects from GitHub. Results show that features related to the semantics, readability, and text richness of issues can be used to effectively characterize GFIs. Our prediction model achieves a median AUC of 0.88. Results from our user study further prove its potential practical value.

Conclusions. This paper provides new insights and practical guidelines to facilitate the understanding of GFIs and the automation of GFIs labeling.

Link to Preprint

https://www.eecs.yorku.ca/~wangsong/papers/esem21b.pdf

Yuekai Huang

Institute of Software, Chinese Academy of Sciences

China

Junjie Wang

Institute of Software at Chinese Academy of Sciences

China

Song Wang

York University

Canada

Zhe Liu

Institute of Software at Chinese Academy of Sciences

China

Dandan Wang

Institute of Software, Chinese Academy of Sciences

China

Qing Wang

Institute of Software at Chinese Academy of Sciences

China

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 15 Oct
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:20 - 15:20	Mining Software RepositoriesTechnical Papers at ESEM ROOM Chair(s): Fabio Calefato University of Bari

14:20 15m Talk		Characterizing and Predicting Good First Issues Technical Papers Yuekai Huang Institute of Software, Chinese Academy of Sciences, Junjie Wang Institute of Software at Chinese Academy of Sciences, Song Wang York University, Zhe Liu Institute of Software at Chinese Academy of Sciences, Dandan Wang Institute of Software, Chinese Academy of Sciences, Qing Wang Institute of Software at Chinese Academy of Sciences Pre-print
14:35 15m Talk		An Empirical Study on Refactoring-Inducing Pull Requests Technical Papers Flavia Coelho Federal University of Campina Grande, Nikolaos Tsantalis Concordia University, Tiago Massoni Federal University of Campina Grande, Everton L. G. Alves Federal University of Campina Grande Pre-print Media Attached
14:50 15m Talk		Promises and Perils of Inferring Personality on GitHub Technical Papers Frenk van Mil Delft University of Technology, Ayushi Rastogi University of Groningen, The Netherlands, Andy Zaidman Delft University of Technology Pre-print Media Attached
15:05 15m Talk		An Exploratory Study on Dead Methods in Open-source Java Desktop Applications Technical Papers Danilo Caivano University of Bari, Pietro Cassieri University of Basilicata, Simone Romano University of Bari, Giuseppe Scanniello University of Basilicata

Information for Participants

Fri 15 Oct 2021 14:20 - 15:20 at ESEM ROOM - Mining Software Repositories Chair(s): Fabio Calefato

Info for room ESEM ROOM:

https://www.youtube.com/c/ESEM_Conference