Good things come in three: Generating SO Post Titles with Pre-Trained Models, Self Improvement and Post Ranking (ESEIW 2024 - ESEM Technical Papers Track)

Who

Duc Anh Le, Anh M. T. Bui, Phuong T. Nguyen, Davide Di Ruscio

Track

ESEIW 2024 ESEM Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Brussels, Copenhagen, Madrid, Paris.

Use conference time zone: (GMT+02:00) Brussels, Copenhagen, Madrid, ParisSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 24 Oct 2024 16:40 - 17:00 at Telensenyament (B3 Building - 1st Floor) - Machine learning for software engineering Chair(s): Luigi Quaranta

Abstract

Stack Overflow is a prominent Q&A forum, supporting developers in seeking suitable resources on programming-related matters. Having high-quality question titles is an effective means to attract developers’ attention. Unfortunately, this is often underestimated, leaving room for improvement. Research has been conducted, predominantly leveraging pre-trained models to generate titles from code snippets and problem descriptions. Yet, getting high-quality titles is still a challenging task, attributed to both the quality of the input data (e.g., containing noise and ambiguity) and inherent constraints in sequence generation models. In this paper, we present FILLER as a solution to generating Stack Overflow post titles using a fine-tuned language model with self-improvement and post ranking. Our study focuses on enhancing pre-trained language models for generating titles for Stack Overflow posts, employing a training and subsequent fine-tuning paradigm for these models. To this end, we integrate the model’s predictions into the training process, enabling it to learn from its errors, thereby lessening the effects of exposure bias. Moreover, we apply a post-ranking method to produce a variety of sample candidates, subsequently selecting the most suitable one. To evaluate FILLER, we perform experiments using benchmark datasets, and the empirical findings indicate that our model provides high-quality recommendations. Moreover, it significantly outperforms all the baselines, including Code2Que, SOTitle, CCBERT, M3NSCT5, and GPT3.5-turbo. A user study also shows that FILLER provides more relevant titles, with respect to SOTitle and GPT3.5-turbo.

Link to Preprint

https://arxiv.org/pdf/2406.15633

Duc Anh Le

Hanoi University of Science and Technology

Anh M. T. Bui

Hanoi University of Science and Technology

Vietnam

Phuong T. Nguyen

University of L’Aquila

Italy

Davide Di Ruscio

University of L'Aquila