ChatGPT application in Systematic Literature Reviews in Software Engineering: an evaluation of its accuracy to support the selection activity
Context: The Systematic Literature Review (SLR) process involves searching, selecting, and synthesizing relevant literature on a specific research topic for evidence-based decision-making in Software Engineering (SE). Due to the time-consuming of the SLR process, tool support is essential. Gap: ChatGPT is a significant advancement in Natural Language Processing (NLP), and it can potentially accelerate time-consuming and propone-error activities, such as the selection activity of the SLR process. Therefore, having a tool to assist in the selection process appears beneficial, and we argue that ChatGPT can facilitate the analysis of extensive studies, saving time and effort. Objective: We aim to evaluate the accuracy (i.e., studies correctly classified) of using ChatGPT-4.0 in SLR in SE, particularly to support the first stage, based on the title, abstract, and keywords. Method: We assessed the accuracy of utilizing ChatGPT for selecting studies, the first stage, to be included in two SLRs (SLR1 and SLR2), in contrast to the conventional method of reading the title and abstract. Results: The accuracy of ChatGPT supporting the initial selection activity was 75.3% (SLR1 - 101 correct selections: 48 inclusions and 53 exclusions; 33 incorrect selections: 17 inclusions and 16 exclusions) and 86.1% (SLR2 - 386 correct selections: 113 inclusions and 273 exclusions; 62 incorrect selections: 27 inclusions and 35 exclusions). Conclusions: Our accuracy results indicate that it is not advisable to completely outsource the selection process to ChatGPT. However, it could be valuable as a support tool, aiding novice researchers or even experienced ones when they are in doubt.