On the Significance of Category Prediction for Code-Comment Synchronization
Software comments sometimes are not promptly updated in sync when the associated code is changed. The inconsistency between code and comments may mislead the developers and result in future bugs. Thus, studies concerning code-comment synchronization have become highly important, which aims to automatically synchronize comments with code changes. Existing code-comment synchronization approaches mainly contain two types, i.e., (1) deep learning-based (e.g., CUP), and (2) heuristic-based (e.g., HebCUP). The former constructs a neural machine translation-structured semantic model, which has a more generalized capability on synchronizing comments with software evolution and growth. However, the latter designs a series of rules for performing token-level replacements on old comments, which can generate the completely correct comments for the samples fully covered by their fine-designed heuristic rules. In this article, we propose a composite approach named CBS (i.e., Classifying Before Synchronizing) to further improve the code-comment synchronization performance, which combines the advantages of CUP and HebCUP with the assistance of inferred categories of Code-Comment Inconsistent (CCI) samples. Specifically, we firstly define two categories (i.e., heuristic-prone and non-heuristic-prone) for CCI samples and propose five features to assist category prediction. The samples whose comments can be correctly synchronized by HebCUP are heuristic-prone, while others are non-heuristic-prone. Then, CBS employs our proposed Multi-Subsets Ensemble Learning (MSEL) classification algorithm to alleviate the class imbalance problem and construct the category prediction model. Next, CBS uses the trained MSEL to predict the category of the new sample. If the predicted category is heuristic-prone, CBS employs HebCUP to conduct the code-comment synchronization for the sample, otherwise, CBS allocates CUP to handle it. Our extensive experiments demonstrate that CBS statistically significantly outperforms CUP and HebCUP, and obtains an average improvement of 23.47%, 22.84%, 3.04%, 3.04%, 1.64%, and 19.39% in terms of Accuracy, Recall@5, Average Edit Distance (AED), Relative Edit Distance (RED), BLEU-4, and Effective Synchronized Sample (ESS) ratio, respectively, which highlights that category prediction for CCI samples can boost the code-comment synchronization performance.
Wed 17 MayDisplayed time zone: Hobart change
15:45 - 17:15 | DocumentationTechnical Track / Journal-First Papers at Level G - Plenary Room 1 Chair(s): Denys Poshyvanyk College of William and Mary | ||
15:45 15mTalk | Developer-Intent Driven Code Comment Generation Technical Track Fangwen Mu Institute of Software Chinese Academy of Sciences, Xiao Chen Institute of Software Chinese Academy of Sciences, Lin Shi ISCAS, Song Wang York University, Qing Wang Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences Pre-print | ||
16:00 15mTalk | Data Quality Matters: A Case Study of ObsoleteComment Detection Technical Track Shengbin Xu Nanjing University, Yuan Yao Nanjing University, Feng Xu Nanjing University, Tianxiao Gu TikTok Inc., Jingwei Xu , Xiaoxing Ma Nanjing University Pre-print | ||
16:15 15mTalk | Revisiting Learning-based Commit Message Generation Technical Track Jinhao Dong Peking University, Yiling Lou Fudan University, Dan Hao Peking University, Lin Tan Purdue University Pre-print | ||
16:30 15mTalk | Commit Message Matters: Investigating Impact and Evolution of Commit Message Quality Technical Track | ||
16:45 7mTalk | On the Significance of Category Prediction for Code-Comment Synchronization Journal-First Papers Zhen Yang City University of Hong Kong, China, Jacky Keung City University of Hong Kong, Xiao Yu Wuhan University of Technology, Yan Xiao National University of Singapore, Zhi Jin Peking University, Jingyu Zhang City University of Hong Kong | ||
16:52 7mTalk | Correlating Automated and Human Evaluation of Code Documentation Generation Quality Journal-First Papers Xing Hu Zhejiang University, Qiuyuan Chen Zhejiang University, Haoye Wang Hangzhou City University, Xin Xia Huawei, David Lo Singapore Management University, Thomas Zimmermann Microsoft Research | ||
17:00 7mTalk | Predictive Comment Updating with Heuristics and AST-Path-Based Neural Learning: A Two-Phase Approach Journal-First Papers Bo Lin National University of Defense Technology, Shangwen Wang National University of Defense Technology, Zhongxin Liu Zhejiang University, Xin Xia Huawei, Xiaoguang Mao National University of Defense Technology Link to publication DOI Pre-print |