TCSE logo 
 Sigsoft logo
Sustainability badge
Sat 3 May 2025 10:05 - 10:17 at 203 - Keynote & ESE4ML Chair(s): Sira Vegas

In the short period since the release of ChatGPT in November 2022, large language models (LLMs) have changed the software engineering research landscape. While there are numerous opportunities to use LLMs for supporting research or software engineering tasks, solid science needs rigorous empirical evaluations. However, so far, there are no specific guidelines for conducting and assessing studies involving LLMs in software engineering research. Our focus is on empirical studies that either use LLMs as part of the research process (e.g., for data annotation) or studies that evaluate existing or new tools that are based on LLMs. This paper contributes the first set of guidelines for such studies. Our goal is to start a discussion in the software engineering research community to reach a common understanding of what our community standards are for high-quality empirical studies involving LLMs.

Sat 3 May

Displayed time zone: Eastern Time (US & Canada) change

09:00 - 10:30
Keynote & ESE4MLWSESE at 203
Chair(s): Sira Vegas Universidad Politecnica de Madrid
09:00
15m
Other
Welcome
WSESE
Sira Vegas Universidad Politecnica de Madrid, Andreas Jedlitschka Fraunhofer IESE
09:15
50m
Keynote
The Methodological Implications of Using Generative AI in Software Engineering Research
WSESE
Margaret-Anne Storey University of Victoria
10:05
12m
Talk
Towards Evaluation Guidelines for Empirical Studies involving LLMs
WSESE
Stefan Wagner Technical University of Munich, Marvin Muñoz Barón Technical University of Munich, Falessi Davide University of Rome Tor Vergata, Sebastian Baltes University of Bayreuth
10:17
13m
Live Q&A
Keynote & ESE4ML: Discussion
WSESE

:
:
:
: