FSE 2025
Mon 23 - Fri 27 June 2025 Trondheim, Norway
co-located with ISSTA 2025
Wed 25 Jun 2025 14:20 - 14:40 at Cosmos Hall - LLM for SE 4 Chair(s): Ting Su

A good summary can often be very useful during program comprehension. While a brief, fluent, and relevant summary can be helpful, it does require significant human effort to produce. Often, good summaries are unavailable in software projects, thus making maintenance more difficult. There has been a considerable body of research into automated AI-based methods, using Large Language models (LLMs), to generate summaries of code; there also has been quite a bit work on ways to measure the performance of such summarization methods, with special attention paid to how closely these AI-generated summaries resemble a summary a human might have produced. Measures such as BERTScore and BLEU have been suggested and evaluated with human-subject studies.

However, prior work has noted that LLM-produced summaries can be too long, disfluent, irrelevant, etc: generally, too dissimilar to what a human might say. Given an LLM-produced code summary, how can we judge if a summary is good enough? Given some input source code, and an LLM-generated summary, existing approaches can help judge brevity, fluency and relevance; however, it’s difficult to gauge whether an LLM-produced summary sufficiently resembles what a human might produce, without a “golden” human-produced summary to compare against. Prior research indicates that human-produced summaries are generally preferred by human-raters, so we explore this issue in this paper. We study this resemblance question as a calibration problem: given just the summary from an LLM, can we compute a confidence measure, that provides a reliable indication of whether the summary sufficiently resembles what a human would have produced in this situation? We examine this question using several LLMs, for several languages, and in several different settings. Our investigation suggests approaches to provide reliable predictions of the likelihood that an LLM-generated summary would sufficiently resemble a summary a human might write for the same code.

Wed 25 Jun

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:00 - 15:20
LLM for SE 4Research Papers / Journal First at Cosmos Hall
Chair(s): Ting Su East China Normal University
14:00
20m
Talk
Large Language Models for Software Engineering: A Systematic Literature Review
Journal First
Xinyi Hou Huazhong University of Science and Technology, Yanjie Zhao Huazhong University of Science and Technology, Yue Liu Monash University, Zhou Yang Singapore Management University; University of Alberta, Kailong Wang Huazhong University of Science and Technology, Li Li Beihang University, Xiapu Luo Hong Kong Polytechnic University, David Lo Singapore Management University, John Grundy Monash University, Haoyu Wang Huazhong University of Science and Technology
14:20
20m
Talk
Calibration of Large Language Models on Code Summarization
Research Papers
Yuvraj Virk UC Davis, Prem Devanbu University of California at Davis, Toufique Ahmed IBM Research
DOI
14:40
20m
Talk
Code Red! On the Harmfulness of Applying Off-the-shelf Large Language Models to Programming Tasks
Research Papers
Ali Al-Kaswan Delft University of Technology, Netherlands, Sebastian Deatc Delft University of Technology, Begüm Koç Delft University of Technology, Arie van Deursen TU Delft, Maliheh Izadi Delft University of Technology
DOI Pre-print
15:00
20m
Talk
PATCH: Empowering Large Language Model with Programmer-Intent Guidance and Collaborative-Behavior Simulation for Automatic Bug Fixing
Journal First
Yuwei Zhang Institute of Software Chinese Academy of Sciences, Zhi Jin Peking University, xingying Beijing University of Posts and Telecommunications, Ge Li Peking University, Fang Liu Beihang University, Jiaxin Zhu Institute of Software at Chinese Academy of Sciences, Wensheng Dou Institute of Software Chinese Academy of Sciences, Jun Wei Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences

Information for Participants
Wed 25 Jun 2025 14:00 - 15:20 at Cosmos Hall - LLM for SE 4 Chair(s): Ting Su
Info for room Cosmos Hall:

This is the main event hall of Clarion Hotel, which will be used to host keynote talks and other plenary sessions. The FSE and ISSTA banquets will also happen in this room.

The room is just in front of the registration desk, on the other side of the main conference area. The large doors with numbers “1” and “2” provide access to the Cosmos Hall.

:
:
:
: