Optimizing Datasets for Code Summarization: Is Code-Comment Coherence Enough?
Automated code summarization is a long-standing goal for code comprehension. This task automatically generates documentation using a given method. Deep Learning (DL)-based approaches have been proven beneficial for various software engineering (SE) tasks, including this one. Most state-of-the-art datasets for code summarization are automatically mined from GitHub and, thus, might contain erroneous or sub-optimal examples. Previous work showed that using a simple rule-based approach for removing noisy instances allows for a tangible reduction of the training set size while not reducing the effectiveness of the trained models. Motivated by this finding, we conjecture that it is possible to further reduce the dataset size by removing instances that contain different issues. In this paper, we explore the extent to which code-comment coherence, a specific quality attribute of code summaries, can be used to optimize code summarization datasets. Specifically, we hypothesize that removing incoherent code-comment pairs might positively impact the effectiveness of the models. To do this, we rely on SIDE, a recently introduced metric for code-summary coherence. We examine multiple selectivity levels of training instances from two state-of-the-art datasets (TL-CodeSum and Funcom) and evaluate the resulting models on three manually curated test sets. The results show that even halving the training set sizes does not significantly affect the model’s ability to generate summaries. However, when comparing the most restrictive selection strategy with a simpler one that randomly selects the training instances, we observe that the resulting accuracy of the model also does not change. This result suggests that (i) current datasets contain many irrelevant examples, and (ii) different quality attributes should be explored for optimizing code summarization datasets.
Sun 27 AprDisplayed time zone: Eastern Time (US & Canada) change
16:00 - 17:30 | Summarisation, Natural Language GenerationResearch Track / Early Research Achievements (ERA) / Replications and Negative Results (RENE) at 205 Chair(s): Oscar Chaparro William & Mary, Coen De Roover Vrije Universiteit Brussel, Gema Rodríguez-Pérez Department of Computer Science, Mathematics, Physics and Statistics, University of British Columbia, Okanagan Campus | ||
16:00 10mTalk | Optimizing Datasets for Code Summarization: Is Code-Comment Coherence Enough? Research Track Antonio Vitale Politecnico di Torino, University of Molise, Antonio Mastropaolo William and Mary, USA, Rocco Oliveto University of Molise, Massimiliano Di Penta University of Sannio, Italy, Simone Scalabrino University of Molise | ||
16:10 10mTalk | CMDeSum: A Cross-Modal Deliberation Network for Code Summarization Research Track Zhifang Liao Central South University, Xiaoyu Liu Central South University, Peng Lan School of Computer Science and Engineering, Central South University, Changsha, China, Song Yu Central South University, Pei Liu Monash University | ||
16:20 10mTalk | CLCoSum: Curriculum Learning-based Code Summarization for Code Language Models Research Track Hongkui He South China University of Technology, Jiexin Wang South China University of Technology, Liuwen Cao South China University of Technology, Yi Cai School of Software Engineering, South China University of Technology, Guangzhou, China | ||
16:30 10mTalk | DLCoG: A Novel Framework for Dual-Level Code Comment Generation based on Semantic Segmentation and In-Context Learning Research Track Zhang Zhiyang , Haiyang Yang School of Computer Science and Engineering, Central South University, Qingyang Yan Central South University, Hao Yan Central South University, Wei-Huan Min Central South University, Zhao Wei Tencent, Li Kuang Central South University, Yingjie Xia Hangzhou Dianzi University | ||
16:40 10mTalk | Explaining GitHub Actions Failures with Large Language Models: Challenges, Insights, and Limitations Research Track Pablo Valenzuela-Toledo University of Bern, Universidad de La Frontera, Chuyue Wu University of Bern, Sandro Hernández University of Bern, Alexander Boll University of Bern, Roman Machacek University of Bern, Sebastiano Panichella University of Bern, Timo Kehrer University of Bern | ||
16:50 10mTalk | Large Language Models are Qualified Benchmark Builders: Rebuilding Pre-Training Datasets for Advancing Code Intelligence Tasks Research Track Kang Yang National University of Defense Technology, Xinjun Mao National University of Defense Technology, Shangwen Wang National University of Defense Technology, Yanlin Wang Sun Yat-sen University, Tanghaoran Zhang National University of Defense Technology, Yihao Qin National University of Defense Technology, Bo Lin National University of Defense Technology, Zhang Zhang Key Laboratory of Software Engineering for Complex Systems, National University of Defense Technology, Yao Lu National University of Defense Technology, Kamal Al-Sabahi College of Banking and Financial Studies Pre-print | ||
17:00 10mTalk | Extracting Formal Specifications from Documents Using LLMs for Test Automation Research Track Hui Li Xiamen University, Zhen Dong Fudan University, Siao Wang Fudan University, Hui Zhang Fudan University, Liwei Shen Fudan University, Xin Peng Fudan University, Dongdong She HKUST (The Hong Kong University of Science and Technology) | ||
17:10 6mTalk | Using Large Language Models to Generate Concise and Understandable Test Case Summaries Early Research Achievements (ERA) Natanael Djajadi Delft University of Technology, Amirhossein Deljouyi Delft University of Technology, Andy Zaidman TU Delft Pre-print | ||
17:16 6mTalk | Towards Generating the Rationale for Code Changes Replications and Negative Results (RENE) Francesco Casillo Università di Salerno, Antonio Mastropaolo William and Mary, USA, Gabriele Bavota Software Institute @ Università della Svizzera Italiana, Vincenzo Deufemia University of Salerno, Carmine Gravino University of Salerno | ||
17:22 8mTalk | Session's Discussion: "Summarisation, Natural Language Generation" Research Track |