LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition (ICPC 2022 - Research)

Who

Rishab Sharma, Fuxiang Chen, Fatemeh Hendijani Fard

Track

ICPC 2022 Research

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 15 May 2022 21:58 - 22:05 at ICPC room - Session 1: Summarization Chair(s): Haipeng Cai

Abstract

Code comment generation is the task of generating a high-level natural language description for a given code method/function. Although researchers have been studying multiple ways to generate code comments automatically, previous work mainly considers representing a code token in its entirety semantics form only (e.g., a language model is used to learn the semantics of a code token), and additional code properties such as the tree structure of a code are included as an auxiliary input to the model. There are two limitations: 1) Learning the code token in its entirety form may not be able to capture information succinctly in source code, and 2) The code token does not contain additional syntactic information, inherently important in programming languages.

In this paper, we present LAnguage Model and Named Entity Recognition (LAMNER), a code comment generator capable of encoding code constructs effectively and capturing the structural property of a code token. A character-level language model is used to learn the semantic representation to encode a code token. For the structural property of a token, a Named Entity Recognition model is trained to learn the different types of code tokens. These representations are then fed into an encoder-decoder architecture to generate code comments. We evaluate the generated comments from LAMNER and other baselines on a popular Java dataset with four commonly used metrics. Our results show that LAMNER is effective and improves over the best baseline model in BLEU-1, BLEU-2, BLEU-3, BLEU-4, ROUGE-L, METEOR, and CIDEr by 14.34%, 18.98%, 21.55%, 23.00%, 10.52%, 1.44%, and 25.86%, respectively. Additionally, we fused LAMNER’s code representation with the baseline models, and the fused models consistently showed improvement over the non-fused models. The human evaluation further shows that LAMNER produces high-quality code comments. We believe that LAMNER can benefit the community, and we will open-source LAMNER.

Link to Preprint

https://arxiv.org/abs/2204.09654

Rishab Sharma

University of British Columbia

Fuxiang Chen

University of British Columbia

Fatemeh Hendijani Fard

University of British Columbia

Canada

Media

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sun 15 May
Displayed time zone: Eastern Time (US & Canada) change

21:30 - 22:20	Session 1: SummarizationResearch at ICPC room Chair(s): Haipeng Cai Washington State University, USA

21:30 7m Talk		PTM4Tag: Sharpening Tag Recommendation of Stack Overflow with Pre-trained Models Research Junda He Singapore Management University, Bowen Xu Singapore Management University, Zhou Yang Singapore Management University, DongGyun Han Singapore Management University, Chengran Yang Singapore Management University, David Lo Singapore Management University Media Attached
21:37 7m Talk		GypSum: Learning Hybrid Representations for Code Summarization Research Yu Wang School of Data Science and Engineering, East China Normal University, Yu Dong School of Data Science and Engineering, East China Normal University, Xuesong Lu School of Data Science and Engineering, East China Normal University, Aoying Zhou East China Normal University DOI Pre-print Media Attached
21:44 7m Talk		M2TS: Multi-Scale Multi-Modal Approach Based on Transformer for Source Code Summarization Research Yuexiu Gao Shandong Normal University, Chen Lyu Shandong Normal University Media Attached
21:51 7m Talk		Semantic Similarity Metrics for Evaluating Source Code Summarization Research Sakib Haque University of Notre Dame, Zachary Eberhart University of Notre Dame, Aakash Bansal University of Notre Dame, Collin McMillan University of Notre Dame Media Attached
21:58 7m Talk		LAMNER: Code Comment Generation Using Character Language Model and Named Entity Recognition Research Rishab Sharma University of British Columbia, Fuxiang Chen University of British Columbia, Fatemeh Hendijani Fard University of British Columbia Pre-print Media Attached
22:05 15m Live Q&A		Q&A-Paper Session 1 Research

Information for Participants

Sun 15 May 2022 21:30 - 22:20 at ICPC room - Session 1: Summarization Chair(s): Haipeng Cai

Info for room ICPC room:

Click here to go to the room on Midspace