TCSE logo 
 Sigsoft logo
Sustainability badge
Sat 3 May 2025 11:00 - 11:15 at 203 - ML4ESE Chair(s): Andreas Jedlitschka

Context: The emergence of Large Language Models (LLMs) has significantly transformed Software Engineering (SE) by providing innovative methods for analyzing software repositories. Objectives: Our objective is to establish a practical framework for future SE researchers needing to enhance the data collection and dataset while conducting software repository mining studies using LLMs. Method: This experience report shares insights from two previous repository mining studies, focusing on the methodologies used for creating, refining, and validating prompts that enhance the output of LLMs, particularly in the context of data collection in empirical studies. Results: Our research packages a framework, coined Prompt Refinement and Insights for Mining Empirical Software repositories (PRIMES), consisting of a checklist that can improve LLM usage performance, enhance output quality, and minimize errors through iterative processes and comparisons among different LLMs. We also emphasize the significance of reproducibility by implementing mechanisms for tracking model results. Conclusion: Our findings indicate that standardizing prompt engineering and using PRIMES can enhance the reliability and reproducibility of studies utilizing LLMs. Ultimately, this work calls for further research to address challenges like hallucinations, model biases, and cost-effectiveness in integrating LLMs into workflows.

Sat 3 May

Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30
ML4ESEWSESE at 203
Chair(s): Andreas Jedlitschka Fraunhofer IESE
11:00
15m
Talk
A Framework for Using LLMs for Repository Mining Studies in Empirical Software Engineering
WSESE
Vincenzo De Martino University of Salerno, Joel Castaño Fernández Universitat Politècnica de Catalunya, Fabio Palomba University of Salerno, Xavier Franch Universitat Politècnica de Catalunya, Silverio Martínez-Fernández UPC-BarcelonaTech
Pre-print
11:15
18m
Talk
Can Machine Learning Support the Selection of Studies for Systematic Literature Review Updates?
WSESE
Marcelo Costalonga Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Bianca Minetto Napoleão Université du Québec à Chicoutimi, Maria Teresa Baldassarre Department of Computer Science, University of Bari , Katia Felizardo Federal Technological University of Paraná, Igor Steinmacher NAU RESHAPE LAB, Marcos Kalinowski Pontifical Catholic University of Rio de Janeiro (PUC-Rio)
11:33
18m
Talk
Applications and Implications of Large Language Models in Qualitative Analysis: A New Frontier for Empirical Software Engineering
WSESE
Matheus de Morais Leça University of Calgary, Lucas Valença University of Calgary, Reydne Bruno dos Santos UFPE, Ronnie de Souza Santos University of Calgary
11:51
18m
Talk
Large Language Model for Qualitative Research - A Systematic Mapping Study
WSESE
Cauã Ferreira Barros Federal University of Goiás, Bruna Borges Azevedo Federal University of Goiás, Valdemar Graciano Neto Federal University of Goiás, Mohamad Kassab Boston University, USA, Marcos Kalinowski Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Hugo Alexandre D. do Nascimento Federal University of Goiás, Michelle C.G.S.P. Bandeira Federal University of Goiás
12:09
21m
Live Q&A
ML4ESE: Discussion
WSESE

:
:
:
: