When Retriever Meets Generator: A Joint Model for Code Comment Generation (ESEIW 2025 - ESEM - Emerging Results and Vision Track )

Who

Tien L. T. Pham, Anh M. T. Bui, Huy N. D. Pham, Alessio Bucaioni, Phuong T. Nguyen

Track

ESEIW 2025 ESEM - Emerging Results and Vision Track

Time Zone

The program is currently displayed in (GMT-10:00) Hawaii.

Use conference time zone: (GMT-10:00) HawaiiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 2 Oct 2025 13:50 - 14:05 at Kaiulani II - Program Comprehension and Review 1 Chair(s): Nicole Novielli

Abstract

Automatically generating concise, informative comments for source code can lighten documentation effort and accelerate program comprehension. Retrieval-augmented approaches first fetch code snippets with existing comments and then synthesize a new comment, yet retrieval and generation are typically optimized in isolation, allowing irrelevant neighbors to propagate noise downstream. To tackle the issue, we propose a novel approach named RAGSum with the aim of both effectiveness and efficiency in recommendations. RAGSum is built on top of fuse retrieval and generation using a single CodeT5 backbone. We report preliminary results on a unified retrieval-generation framework built on CodeT5. A contrastive pre-training phase shapes code embeddings for nearest-neighbor search; these weights then seed end-to-end training with a composite loss that (i) rewards accurate top-k retrieval; and (ii) minimizes comment generation error. More importantly, a lightweight self-refinement loop is deployed to polish the final output. We evaluated the framework on three cross-language benchmarks (Java, Python, C) and compared it with three well-established baselines. The results show that our approach substantially outperforms the baselines with respect to the BLEU, METEOR, and ROUTE-L scores. These early findings indicate that tightly coupling retrieval and generation can raise the ceiling for comment automation and motivate forthcoming industrial replications and qualitative developer studies.

Link to Preprint

https://arxiv.org/pdf/2507.12558

Tien L. T. Pham

Hanoi University of Science and Technology

Anh M. T. Bui

Hanoi University of Science and Technology

Vietnam

Huy N. D. Pham

AI Young Talent Academy (AI4Life), Hanoi University of Science and Technology

Alessio Bucaioni

Malardalen University

Phuong T. Nguyen

University of L’Aquila

Italy