ChatGPT application in Systematic Literature Reviews in Software Engineering: an evaluation of its accuracy to support the selection activity (ESEIW 2024 - ESEM Technical Papers Track)

Who

Katia Romero Felizardo, Marcia Sampaio Lima, Anderson Deizepe, Tayana Conte, Igor Steinmacher

Track

ESEIW 2024 ESEM Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Brussels, Copenhagen, Madrid, Paris.

Use conference time zone: (GMT+02:00) Brussels, Copenhagen, Madrid, ParisSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 24 Oct 2024 11:00 - 11:20 at Telensenyament (B3 Building - 1st Floor) - Empirical research methods Chair(s): Stefan Wagner

Abstract

Context: The Systematic Literature Review (SLR) process involves searching, selecting, and synthesizing relevant literature on a specific research topic for evidence-based decision-making in Software Engineering (SE). Due to the time-consuming of the SLR process, tool support is essential. Gap: ChatGPT is a significant advancement in Natural Language Processing (NLP), and it can potentially accelerate time-consuming and propone-error activities, such as the selection activity of the SLR process. Therefore, having a tool to assist in the selection process appears beneficial, and we argue that ChatGPT can facilitate the analysis of extensive studies, saving time and effort. Objective: We aim to evaluate the accuracy (i.e., studies correctly classified) of using ChatGPT-4.0 in SLR in SE, particularly to support the first stage, based on the title, abstract, and keywords. Method: We assessed the accuracy of utilizing ChatGPT for selecting studies, the first stage, to be included in two SLRs (SLR1 and SLR2), in contrast to the conventional method of reading the title and abstract. Results: The accuracy of ChatGPT supporting the initial selection activity was 75.3% (SLR1 - 101 correct selections: 48 inclusions and 53 exclusions; 33 incorrect selections: 17 inclusions and 16 exclusions) and 86.1% (SLR2 - 386 correct selections: 113 inclusions and 273 exclusions; 62 incorrect selections: 27 inclusions and 35 exclusions). Conclusions: Our accuracy results indicate that it is not advisable to completely outsource the selection process to ChatGPT. However, it could be valuable as a support tool, aiding novice researchers or even experienced ones when they are in doubt.

Katia Romero Felizardo

UTFPR-CP

Brazil

Marcia Sampaio Lima

Universidade do Estado do Amazonas - UEA

Brazil

Anderson Deizepe

UTFPR-CP

Brazil

Tayana Conte

Universidade Federal do Amazonas

Brazil

Igor Steinmacher

Northern Arizona University

United States

Time Zone

The program is currently displayed in (GMT+02:00) Brussels, Copenhagen, Madrid, Paris.

Use conference time zone: (GMT+02:00) Brussels, Copenhagen, Madrid, ParisSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 24 Oct
Displayed time zone: Brussels, Copenhagen, Madrid, Paris change

11:00 - 12:30	Empirical research methodsESEM Technical Papers / ESEM Emerging Results, Vision and Reflection Papers Track at Telensenyament (B3 Building - 1st Floor) Chair(s): Stefan Wagner Technical University of Munich

11:00 20m Full-paper		ChatGPT application in Systematic Literature Reviews in Software Engineering: an evaluation of its accuracy to support the selection activity ESEM Technical Papers Katia Romero Felizardo UTFPR-CP, Marcia Sampaio Lima Universidade do Estado do Amazonas - UEA, Anderson Deizepe UTFPR-CP, Tayana Conte Universidade Federal do Amazonas, Igor Steinmacher Northern Arizona University
11:20 20m Full-paper		Is generalisation hindering the adoption of your findings? ESEM Technical Papers Rogardt Heldal Western Norway University of Applied Science
11:40 20m Full-paper		Threats to Validity in Software Engineering -- hypocritical paper section or essential analysis? ESEM Technical Papers Patricia Lago Vrije Universiteit Amsterdam, Per Runeson Lund University, Qunying Song Lund University, Roberto Verdecchia University of Florence Pre-print
12:00 15m Vision and Emerging Results		Data extraction for systematic mapping study using a large language model - a proof-of-concept study in software engineering ESEM Emerging Results, Vision and Reflection Papers Track Katia Romero Felizardo UTFPR-CP, Igor Steinmacher Northern Arizona University, Marcia Sampaio Lima Universidade do Estado do Amazonas - UEA, Anderson Deizepe UTFPR-CP, Tayana Conte Universidade Federal do Amazonas, Monalessa P. Barcellos Federal University of Espírito Santo
12:15 15m Vision and Emerging Results		Crossover Designs in Software Engineering Experiments: Review of the State of Analysis ESEM Emerging Results, Vision and Reflection Papers Track Julian Frattini Blekinge Institute of Technology, Davide Fucci Blekinge Institute of Technology, Sira Vegas Universidad Politecnica de Madrid Link to publication DOI Pre-print Media Attached