ICSE 2025
Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada
Sat 3 May 2025 11:15 - 11:33 at 203 - ML4ESE Chair(s): Andreas Jedlitschka

[Background] Systematic literature reviews (SLRs) are essential for synthesizing evidence in Software Engineering (SE), but keeping them up-to-date requires substantial effort. Study selection, one of the most labor-intensive steps, involves reviewing numerous studies and requires multiple reviewers to minimize bias and avoid loss of evidence. [Objective] This study aims to evaluate if Machine Learning (ML) text classification models can support reviewers in the study selection for SLR updates. [Method] We reproduce the study selection of an SLR update performed by three SE researchers. We trained two supervised ML models (Random Forest and Support Vector Machines) with different configurations using data from the original SLR. We calculated the study selection effectiveness of the ML models for the SLR update in terms of precision, recall, and F-measure. We also compared the performance of human-ML pairs with human-only pairs when selecting studies. [Results] The ML models achieved a modest F-score of 0.33, which is insufficient for reliable automation. However, we found that such models can reduce the study selection effort by 33.9% without loss of evidence (keeping a 100% recall). Our analysis also showed that the initial screening by pairs of human reviewers produces results that are much better aligned with the final SLR update result. [Conclusion] Based on our results, we conclude that although ML models can help reduce the effort involved in SLR updates, achieving rigorous and reliable outcomes still requires the expertise of experienced human reviewers for the initial screening phase.

Sat 3 May

Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30
ML4ESEWSESE at 203
Chair(s): Andreas Jedlitschka Fraunhofer IESE
11:00
15m
Talk
A Framework for Using LLMs for Repository Mining Studies in Empirical Software Engineering
WSESE
Vincenzo De Martino University of Salerno, Joel Castaño Fernández Universitat Politècnica de Catalunya, Fabio Palomba University of Salerno, Xavier Franch Universitat Politècnica de Catalunya, Silverio Martínez-Fernández UPC-BarcelonaTech
Pre-print
11:15
18m
Talk
Can Machine Learning Support the Selection of Studies for Systematic Literature Review Updates?
WSESE
Marcelo Costalonga Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Bianca Minetto Napoleão Université du Québec à Chicoutimi, Maria Teresa Baldassarre Department of Computer Science, University of Bari , Katia Felizardo Federal Technological University of Paraná, Igor Steinmacher NAU RESHAPE LAB, Marcos Kalinowski Pontifical Catholic University of Rio de Janeiro (PUC-Rio)
11:33
18m
Talk
Applications and Implications of Large Language Models in Qualitative Analysis: A New Frontier for Empirical Software Engineering
WSESE
Matheus de Morais Leça University of Calgary, Lucas Valença University of Calgary, Reydne Bruno dos Santos UFPE, Ronnie de Souza Santos University of Calgary
11:51
18m
Talk
Large Language Model for Qualitative Research - A Systematic Mapping Study
WSESE
Cauã Ferreira Barros Federal University of Goiás, Bruna Borges Azevedo Federal University of Goiás, Valdemar Graciano Neto Federal University of Goiás, Mohamad Kassab Boston University, USA, Marcos Kalinowski Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Hugo Alexandre D. do Nascimento Federal University of Goiás, Michelle C.G.S.P. Bandeira Federal University of Goiás
12:09
21m
Live Q&A
ML4ESE: Discussion
WSESE