Revisiting Learning-based Commit Message Generation (ICSE 2023 - Technical Track)

Who

Jinhao Dong, Yiling Lou, Dan Hao, Lin Tan

Track

ICSE 2023 Technical Track

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 17 May 2023 16:15 - 16:30 at Level G - Plenary Room 1 - Documentation Chair(s): Denys Poshyvanyk

Abstract

Commit messages summarize code changes and help developers understand the intention. To alleviate human efforts in writing commit messages, researchers have proposed various automated commit message generation techniques, among which learning-based techniques have achieved great success in recent years. However, existing evaluation on learning-based commit message generation relies on the automatic metrics (e.g., BLEU) widely used in natural language processing (NLP) tasks, which are aggregated scores calculated based on the similarity between generated commit messages and the ground truth. Therefore, it remains unclear what generated commit messages look like and what kind of commit messages could be precisely generated by existing learning-based techniques.

To fill this knowledge gap, this work performs the first study to systematically investigate the detailed commit messages generated by learning-based techniques. In particular, we first investigate the frequent patterns of the commit messages generated by state-of-the-art learning-based techniques. Surprisingly, we find the majority (~90%) of their generated commit messages belong to simple patterns (i.e., addition/removal/fix/avoidance patterns). To further explore the reasons, we then study the impact of datasets, input representations, and model components. We surprisingly find that existing learning-based techniques have competitive performance even when the inputs are only represented by change marks (i.e., “+”/“-”/" "). It indicates that existing learning-based techniques poorly utilize syntax and semantics in the code while mostly focusing on change marks, which could be the major reason for generating so many pattern-matching commit messages. We also find that the pattern ratio in the training set might also positively affect the pattern ratio of generated commit messages; and model components might have different impact on the pattern ratio.

Link to Preprint

https://raw.githubusercontent.com/DJjjjhao/ICSE-MSG-STUDY/main/preprint.pdf

Jinhao Dong

Peking University

China

Yiling Lou

Fudan University

China

Dan Hao

Peking University

China

Lin Tan

Purdue University

United States

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 17 May
Displayed time zone: Hobart change

15:45 - 17:15	DocumentationTechnical Track / Journal-First Papers at Level G - Plenary Room 1 Chair(s): Denys Poshyvanyk College of William and Mary

15:45 15m Talk		Developer-Intent Driven Code Comment Generation Technical Track Fangwen Mu Institute of Software Chinese Academy of Sciences, Xiao Chen Institute of Software Chinese Academy of Sciences, Lin Shi ISCAS, Song Wang York University, Qing Wang Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences Pre-print
16:00 15m Talk		Data Quality Matters: A Case Study of ObsoleteComment Detection Technical Track Shengbin Xu Nanjing University, Yuan Yao Nanjing University, Feng Xu Nanjing University, Tianxiao Gu TikTok Inc., Jingwei Xu , Xiaoxing Ma Nanjing University Pre-print
16:15 15m Talk		Revisiting Learning-based Commit Message Generation Technical Track Jinhao Dong Peking University, Yiling Lou Fudan University, Dan Hao Peking University, Lin Tan Purdue University Pre-print
16:30 15m Talk		Commit Message Matters: Investigating Impact and Evolution of Commit Message Quality Technical Track Jiawei Li University of California, Irvine, Iftekhar Ahmed University of California at Irvine
16:45 7m Talk		On the Significance of Category Prediction for Code-Comment Synchronization Journal-First Papers Zhen Yang City University of Hong Kong, China, Jacky Keung City University of Hong Kong, Xiao Yu Wuhan University of Technology, Yan Xiao National University of Singapore, Zhi Jin Peking University, Jingyu Zhang City University of Hong Kong
16:52 7m Talk		Correlating Automated and Human Evaluation of Code Documentation Generation Quality Journal-First Papers Xing Hu Zhejiang University, Qiuyuan Chen Zhejiang University, Haoye Wang Hangzhou City University, Xin Xia Huawei, David Lo Singapore Management University, Thomas Zimmermann Microsoft Research
17:00 7m Talk		Predictive Comment Updating with Heuristics and AST-Path-Based Neural Learning: A Two-Phase Approach Journal-First Papers Bo Lin National University of Defense Technology, Shangwen Wang National University of Defense Technology, Zhongxin Liu Zhejiang University, Xin Xia Huawei, Xiaoguang Mao National University of Defense Technology Link to publication DOI Pre-print