ESEIW 2025
Sun 28 September - Fri 3 October 2025

Context: Issue labeling is a fundamental task for software development as it is critical for the effective management of software projects. This practice involves assigning a label to issues, such as bug or feature request, denoting a task relevant to the project management. To date, large language models (LLMs) have been proposed to automate this task, including both fine-tuned BERT-like models and zero-shot GPT-like models.

Objectives: In this paper, we investigate which LLMs offer the best trade-off between performance, response time, hardware requirements, and quality of the responses for issue report classification.

Methods: We design and execute a comprehensive benchmark study to assess 22 generative decoder-only LLMs and 2 baseline BERT-like encoder-only models, which we evaluate on two different datasets of GitHub issues.

Results: Generative LLMs demonstrate potential for zero-shot classification. However, their performance varies significantly across datasets and they require substantial computational resources for deployment. In contrast, BERT-like models show more consistent performance and lower resource requirements.

Conclusions: Based on the empirical evidence provided in this study, we discuss implications for researchers and practitioners. In particular, our results suggest that fine-tuning BERT-like encoder-only models enables achieving consistent, state-of-the-art performance across datasets even in presence of a small amount of labeled data available for training.

Thu 2 Oct

Displayed time zone: Hawaii change

13:50 - 14:50
13:50
15m
Talk
Contribution History as a Key Feature in OSS Task Recommendation: an LLM-Based Empirical Study
ESEM - Emerging Results and Vision Track
Md Abdul Hannan Colorado State University, Mohammad Habibullah Rakib Colorado State University, Khondaker Masfiq Reza Colorado State University, Fabio Marcos De Abreu Santos Colorado State University, USA
14:05
15m
Talk
Exploring LLMs for Stakeholder-Specific Insight Generation from Software Contracts
ESEM - Industry, Government, and Community Track
Jyoti Shukla TCS Research, Aditya Kahol TCS Research, Mohit Chaudhary TCS Research, Preethu Rose Anish TCS Research
14:20
15m
Talk
Benchmarking large language models for automated labeling: The case of issue report classification
ESEM - Journal First Track
Giuseppe Colavito University of Bari, Italy, Filippo Lanubile University of Bari, Nicole Novielli University of Bari
Link to publication
14:35
15m
Talk
Secret Breach Detection in Source Code with Large Language Models
ESEM - Technical Track
Md Nafiu Rahman Bangladesh University of Engineering and Techonology, Sadif Ahmed Bangladesh University of Engineering and Techonology, Zahin Wahab The University of British Columbia, S. M. Sohan Google Inc, Rifat Shahriyar Bangladesh University of Engineering and Technology Dhaka, Bangladesh
Pre-print
Hide past events