Source Code Summarization in the Era of Large Language Models (ICSE 2025 - Research Track)

Who

Weisong Sun, Yun Miao, Yuekang Li, Hongyu Zhang, Chunrong Fang, Yi Liu, Gelei Deng, Yang Liu, Zhenyu Chen

Track

ICSE 2025 Research Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 1 May 2025 15:00 - 15:15 at 213 - AI for Program Comprehension 2 Chair(s): Oscar Chaparro

Abstract

To support software developers in understanding and maintaining programs, various automatic (source) code summarization techniques have been proposed to generate a concise natural language summary (i.e., comment) for a given code snippet. Recently, the emergence of large language models (LLMs) has led to a great boost in the performance of code-related tasks. In this paper, we undertake a systematic and comprehensive study on code summarization in the era of LLMs, which covers multiple aspects involved in the workflow of LLM-based code summarization. Specifically, we begin by examining prevalent automated evaluation methods for assessing the quality of summaries generated by LLMs and find that the results of the GPT-4 evaluation method are most closely aligned with human evaluation. Then, we explore the effectiveness of five prompting techniques (zero-shot, few-shot, chain-of-thought, critique, and expert) in adapting LLMs to code summarization tasks. Contrary to expectations, advanced prompting techniques may not outperform simple zero-shot prompting. Next, we investigate the impact of LLMs’ model settings (including top_p and temperature parameters) on the quality of generated summaries. We find the impact of the two parameters on summary quality varies by the base LLM and programming language, but their impacts are similar. Moreover, we canvass LLMs’ abilities to summarize code snippets in distinct types of programming languages. The results reveal that LLMs perform suboptimally when summarizing code written in logic programming languages compared to other language types (e.g., procedural and object-oriented programming languages). Finally, we unexpectedly find that \codellama{} with 7B parameters can outperform advanced GPT-4 in generating summaries describing code implementation details and asserting code properties. We hope that our findings can provide a comprehensive understanding of code summarization in the era of LLMs.

Weisong Sun

Nanjing University

China

Yun Miao

Nanjing University

Yuekang Li

UNSW

Australia

Hongyu Zhang

Chongqing University

China

Chunrong Fang

Nanjing University

China

Yi Liu

Nanyang Technological University

Singapore

Gelei Deng

Nanyang Technological University

Yang Liu

Nanyang Technological University

Singapore

Zhenyu Chen

Nanjing University

China

[ICSE2025] Source Code Summarization in the Era of Large Language Models

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 1 May
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30	AI for Program Comprehension 2Research Track at 213 Chair(s): Oscar Chaparro William & Mary

14:00 15m Talk		Code Comment Inconsistency Detection and Rectification Using a Large Language Model Research Track Guoping Rong Nanjing University, YongdaYu Nanjing University, Song Liu Nanjing University, Xin Tan Nanjing University, Tianyi Zhang Nanjing University, Haifeng Shen Southern Cross University, Jidong Hu Zhongxing Telecom Equipment
14:15 15m Talk		Context Conquers Parameters: Outperforming Proprietary LLM in Commit Message Generation Research Track Aaron Imani University of California, Irvine, Iftekhar Ahmed University of California at Irvine, Mohammad Moshirpour University of California, Irvine
14:30 15m Talk		HedgeCode: A Multi-Task Hedging Contrastive Learning Framework for Code Search Research Track Gong Chen Wuhan University, Xiaoyuan Xie Wuhan University, Xunzhu Tang University of Luxembourg, Qi Xin Wuhan University, Wenjie Liu Wuhan University
14:45 15m Talk		Reasoning Runtime Behavior of a Program with LLM: How Far Are We? Research Track Junkai Chen Zhejiang University, Zhiyuan Pan Zhejiang University, Xing Hu Zhejiang University, Zhenhao Li York University, Ge Li Peking University, Xin Xia Huawei
15:00 15m Talk		Source Code Summarization in the Era of Large Language Models Research Track Weisong Sun Nanjing University, Yun Miao Nanjing University, Yuekang Li UNSW, Hongyu Zhang Chongqing University, Chunrong Fang Nanjing University, Yi Liu Nanyang Technological University, Gelei Deng Nanyang Technological University, Yang Liu Nanyang Technological University, Zhenyu Chen Nanjing University Media Attached
15:15 15m Talk		Template-Guided Program Repair in the Era of Large Language Models Research Track Kai Huang , Jian Zhang Nanyang Technological University, Xiangxin Meng Beihang University, Beijing, China, Yang Liu Nanyang Technological University File Attached