A Framework for Using LLMs for Repository Mining Studies in Empirical Software Engineering (WSESE 2025 - 2nd International Workshop on Methodological Issues with Empirical Studies in Software Engineering)

Who

Vincenzo De Martino, Joel Castaño Fernández, Fabio Palomba, Xavier Franch, Silverio Martínez-Fernández

Track

WSESE 2025 Empirical Studies in SE

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sat 3 May 2025 11:00 - 11:15 at 203 - ML4ESE Chair(s): Andreas Jedlitschka

Abstract

Context: The emergence of Large Language Models (LLMs) has significantly transformed Software Engineering (SE) by providing innovative methods for analyzing software repositories. Objectives: Our objective is to establish a practical framework for future SE researchers needing to enhance the data collection and dataset while conducting software repository mining studies using LLMs. Method: This experience report shares insights from two previous repository mining studies, focusing on the methodologies used for creating, refining, and validating prompts that enhance the output of LLMs, particularly in the context of data collection in empirical studies. Results: Our research packages a framework, coined Prompt Refinement and Insights for Mining Empirical Software repositories (PRIMES), consisting of a checklist that can improve LLM usage performance, enhance output quality, and minimize errors through iterative processes and comparisons among different LLMs. We also emphasize the significance of reproducibility by implementing mechanisms for tracking model results. Conclusion: Our findings indicate that standardizing prompt engineering and using PRIMES can enhance the reliability and reproducibility of studies utilizing LLMs. Ultimately, this work calls for further research to address challenges like hallucinations, model biases, and cost-effectiveness in integrating LLMs into workflows.

Link to Preprint

https://arxiv.org/abs/2411.09974

Vincenzo De Martino

University of Salerno

Italy

Joel Castaño Fernández

Universitat Politècnica de Catalunya

Fabio Palomba

University of Salerno

Italy

Xavier Franch

Universitat Politècnica de Catalunya

Spain

Silverio Martínez-Fernández

UPC-BarcelonaTech