CMDeSum: A Cross-Modal Deliberation Network for Code Summarization
The task of Code Summarization aims to generate concise natural language descriptions for given source code snippets, thereby assisting developers in reducing the cognitive load of comprehending the code. Existing learning-based models, employing single-pass encoder-decoder frameworks, are unable to harness global information to optimize local content, while those utilizing multi-pass encoder-decoder architectures fail to consider the structural information of the code. To tackle this issue, we propose CMDeSum, a novel deliberation framework that injects cross-modal information in a staged manner, aiming to better balance the code sequence information and Abstract Syntax Tree (AST) structural information. Specifically, we first retrieve comments from code segments similar to the given one as drafts and extract method names and ASTs from the code. Then, in the First-Pass stage, we utilize code sequences and method names to generate initial comments and refine them based on the drafts. In the Second-Pass stage, building upon the results from the First-Pass stage, we utilize additional AST information to modify the comments, producing the final comments. To evaluate our approach, we conducted experiments on existing Java and Python datasets. The experimental results indicate that compared with the state-of-the-art models for code summarization generation, our model has improved by at least 6.3%, 3.0%, and 5.8% in BLEU, ROUGE-L, and METEOR.