Automatic Identification of Decisions from the Hibernate Developer Mailing List (EASE 2021 - EASE 2020)

Who

Xueying Li, Peng Liang, Zengyang Li

Track

EASE 2021 EASE 2020

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 23 Jun 2021 10:30 - 10:52 at Zoom - Artificial intelligence in software engineering Chair(s): Torgeir Dingsøyr

Abstract

Decisions run through the whole software development and maintenance processes. Explicitly documenting these decisions helps to organize development knowledge and to reduce its vaporization, thereby controlling the development process and maintenance costs. It can also support the knowledge acquisition process for stakeholders of the project. Meanwhile, developers (e.g., architects) and managers will be able to rely on the decisions made in the past to solve the problems encountered in their current projects. However, identifying decisions from massive textual artifacts, which involves considerable human effort, time, and cost, is usually unaffordable due to limited resources. To address this problem, we conducted an experiment to automatically identify decisions from textual artifacts using machine learning techniques. We created a dataset of 1,300 sentences labelled from the Hibernate developer mailing list, containing 650 decision sentences and non-decision sentences respectively, and trained machine learning models using 160 configurations regarding text preprocessing, feature extraction, and classification algorithms. The results show that (1) the text preprocessing method with Including Stop Words, No Stemming and Lemmatization, and No Filtering Out Sentences performs best when preprocessing posts to identify decisions; (2) the simple Bag-of-Words (BoW) model works best when extracting features to identify decisions; (3) the Support Vector Machine (SVM) algorithm gets the best result when training classifiers to identify decisions; and (4) the SVM algorithm with Including Stop Words (ISW), No Stemming and Lemmatization (NSaL), Filtering Out Sentences by Length (FOSbL), and BoW achieves the best performance (with a precision of 0.640, a recall of 0.932, and an F1-score of 0.759), compared with other configurations when identifying decisions from the mailing list.

Link to Preprint

https://www.researchgate.net/publication/339209767_Automatic_Identification_of_Decisions_from_the_Hibernate_Developer_Mailing_List

Xueying Li

Wuhan University

China

Peng Liang

Wuhan University

China

Zengyang Li

Central China Normal University

China

Dataset

Automatic Identification of Decisions from the Hibernate Developer Mailing List (EASE 2020)

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 23 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:30 - 12:00	Artificial intelligence in software engineeringEASE 2020 at Zoom Chair(s): Torgeir Dingsøyr Norwegian University of Science and Technology

10:30 22m Full-paper		Automatic Identification of Decisions from the Hibernate Developer Mailing List EASE 2020 Xueying Li Wuhan University, Peng Liang Wuhan University, Zengyang Li Central China Normal University Pre-print Media Attached
10:52 22m Full-paper		A Bigram-based Inference Model for Retrieving Abbreviated Phrases in Source Code EASE 2020 Abdulrahman Alatawi , Weifeng Xu University of Baltimore, Dianxiang Xu University of Missouri
11:15 22m Full-paper		A Multinomial Naive Bayesian (MNB) network to automatically recommend topics for GitHub repositories EASE 2020 Claudio Di Sipio University of L'Aquila, Riccardo Rubei University of L'Aquila, Davide Di Ruscio University of L'Aquila, Phuong T. Nguyen University of L’Aquila Pre-print
11:37 22m Other		MLCQ: Industry-relevant Code Smell Data Set EASE 2020 Lech Madeyski , Tomasz Lewowski Wrocław University of Science and Technology Pre-print