Background. Where to start contributing to a project is a critical challenge for newcomers of open source projects. To support newcomers, GitHub utilizes the Good First Issue (GFI) label, with which project members can manually tag issues in an open source project that are suitable for the newcomers. However, manually labeling GFIs is time- and effort-consuming given the large number of candidate issues. In addition, project members need to have a close understanding of the project to label GFIs accurately.
Aims. This paper aims at providing a thorough understanding of the characteristics of GFIs and an automatic approach in GFIs prediction, to reduce the burden of project members and help newcomers easily onboard.
Method. We first define 79 features to characterize the GFIs and further analyze the correlation between each feature and GFIs. We then build machine learning models to predict GFIs with the proposed features.
Results. Experiments are conducted with 74,780 issues from 10 open source projects from GitHub. Results show that features related to the semantics, readability, and text richness of issues can be used to effectively characterize GFIs. Our prediction model achieves a median AUC of 0.88. Results from our user study further prove its potential practical value.
Conclusions. This paper provides new insights and practical guidelines to facilitate the understanding of GFIs and the automation of GFIs labeling.
Fri 15 OctDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
14:20 - 15:20 | Mining Software RepositoriesTechnical Papers at ESEM ROOM Chair(s): Fabio Calefato University of Bari | ||
14:20 15mTalk | Characterizing and Predicting Good First Issues Technical Papers Yuekai Huang Institute of Software, Chinese Academy of Sciences, Junjie Wang Institute of Software at Chinese Academy of Sciences, Song Wang York University, Zhe Liu Institute of Software at Chinese Academy of Sciences, Dandan Wang Institute of Software, Chinese Academy of Sciences, Qing Wang Institute of Software at Chinese Academy of Sciences Pre-print | ||
14:35 15mTalk | An Empirical Study on Refactoring-Inducing Pull Requests Technical Papers Flavia Coelho Federal University of Campina Grande, Nikolaos Tsantalis Concordia University, Tiago Massoni Federal University of Campina Grande, Everton L. G. Alves Federal University of Campina Grande Pre-print Media Attached | ||
14:50 15mTalk | Promises and Perils of Inferring Personality on GitHub Technical Papers Frenk van Mil Delft University of Technology, Ayushi Rastogi University of Groningen, The Netherlands, Andy Zaidman Delft University of Technology Pre-print Media Attached | ||
15:05 15mTalk | An Exploratory Study on Dead Methods in Open-source Java Desktop Applications Technical Papers Danilo Caivano University of Bari, Pietro Cassieri University of Basilicata, Simone Romano University of Bari, Giuseppe Scanniello University of Basilicata |