Automatic Identification of Decisions from the Hibernate Developer Mailing List
Decisions run through the whole software development and maintenance processes. Explicitly documenting these decisions helps to organize development knowledge and to reduce its vaporization, thereby controlling the development process and maintenance costs. It can also support the knowledge acquisition process for stakeholders of the project. Meanwhile, developers (e.g., architects) and managers will be able to rely on the decisions made in the past to solve the problems encountered in their current projects. However, identifying decisions from massive textual artifacts, which involves considerable human effort, time, and cost, is usually unaffordable due to limited resources. To address this problem, we conducted an experiment to automatically identify decisions from textual artifacts using machine learning techniques. We created a dataset of 1,300 sentences labelled from the Hibernate developer mailing list, containing 650 decision sentences and non-decision sentences respectively, and trained machine learning models using 160 configurations regarding text preprocessing, feature extraction, and classification algorithms. The results show that (1) the text preprocessing method with Including Stop Words, No Stemming and Lemmatization, and No Filtering Out Sentences performs best when preprocessing posts to identify decisions; (2) the simple Bag-of-Words (BoW) model works best when extracting features to identify decisions; (3) the Support Vector Machine (SVM) algorithm gets the best result when training classifiers to identify decisions; and (4) the SVM algorithm with Including Stop Words (ISW), No Stemming and Lemmatization (NSaL), Filtering Out Sentences by Length (FOSbL), and BoW achieves the best performance (with a precision of 0.640, a recall of 0.932, and an F1-score of 0.759), compared with other configurations when identifying decisions from the mailing list.
Wed 23 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
10:30 - 12:00 | Artificial intelligence in software engineeringEASE 2020 at Zoom Chair(s): Torgeir Dingsøyr Norwegian University of Science and Technology | ||
10:30 22mFull-paper | Automatic Identification of Decisions from the Hibernate Developer Mailing List EASE 2020 Xueying Li Wuhan University, Peng Liang Wuhan University, Zengyang Li Central China Normal University Pre-print Media Attached | ||
10:52 22mFull-paper | A Bigram-based Inference Model for Retrieving Abbreviated Phrases in Source Code EASE 2020 | ||
11:15 22mFull-paper | A Multinomial Naive Bayesian (MNB) network to automatically recommend topics for GitHub repositories EASE 2020 Claudio Di Sipio University of L'Aquila, Riccardo Rubei University of L'Aquila, Davide Di Ruscio University of L'Aquila, Phuong T. Nguyen University of L’Aquila Pre-print | ||
11:37 22mOther | MLCQ: Industry-relevant Code Smell Data Set EASE 2020 Pre-print |