ComCat: Expertise-Guided Context Generation to Enhance Code Comprehension (ASE 2025 - Journal-First Track)

Who

Skyler Grandel, Scott Andersen, Yu Huang, Kevin Leach

Track

ASE 2025 Journal-First Track

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 19 Nov 2025 12:10 - 12:20 at Grand Hall 3 - Maintenance & Evolution 2

Abstract

Software maintenance constitutes a substantial portion of a software system’s lifetime cost, with comprehension of existing code accounting for roughly half of these expenses [Nguyen(2010)]. While inline comments and documentation can substantially ease code reading [Stapleton et al.(2020)], [Tenny(1988)], current automated approaches often produce unstructured, irrelevant, or overly verbose annotations for code snippets that must be manually selected. Such manual selection not only incurs additional developer effort but also risks overlooking critical code regions, leading to coverage gaps in documentation. Moreover, the vast majority of prior work evaluates comment quality using string-similarity metrics such as BLEU, despite evidence that BLEU correlates poorly with actual developer comprehension and task performance [Stapleton et al.(2020)]. This disconnect leaves open the question of whether automatically generated comments truly benefit maintainers in practice. To address these gaps, we introduce ComCat, an expertise-guided context generation pipeline that leverages developer input and LLM prompting to produce concise, accurate, and contextually appropriate comments in C/C++ code.

ComCat operates in five fully automated stages: (1) an LLVM-based code parser that segments input C/C++ files into logical \emph{Snippets} by leveraging Clang’s AST to identify functions, loops, and branches; (2) a fine-tuned Code Classifier based on CodeBERT that predicts the most helpful comment type for each Snippet, achieving a 96% classification f1-score on held-out examples; (3) a template catalog mapping each comment class to a prompt structure; (4) a prompt generator that composes prompts by combining snippet-level code context with file-level summaries and selected template slots; and (5) ChatGPT-driven comment synthesis followed by automated AST-based insertion into the source. Taken as a whole, ComCat takes C/C++ source code as input and produces thoroughly commented equivalent source code as output, complete with inline comments and function-level summaries. This fully automated pipeline leverages human-subject research to inform its design, ensuring relevance and clarity of generated comments.

We validate ComCat through human-subject research in three studies. First, in HSR1 (n=24), we derive a 12-category schema and corresponding dataset of classified code-comment pairs, identifying the five comment types most useful for comprehension according to expert participants (function, variable, snippet functionality, branch, reasoning). Using this schema, we train the Code Classifier and construct developer-guided templates. Next, in HSR2 (n=54), participants complete short-answer, code writing, and debugging tasks given code that is commented in different ways, and we measure their performance. HSR2 demonstrates that ComCat-generated comments increase overall correctness by 13.3% over human-written comments and 16.3% over naively prompted ChatGPT (improving from 71.4% and 68.4% to 84.7% correctness respectively; $p<0.001$). Finally, in HSR3 (n=32), we find that developers prefer ComCat comments to human and standard ChatGPT outputs in 66%–82% of trials ($p<0.001$), indicating strong subjective endorsement. Beyond correctness and preference, ComCat improves comment consistency over standard ChatGPT by ~70% (BLEU 0.334 vs.\ 0.197) and yields perfect run-to-run code stability (BLEU=1.000). Here we use BLEU only to measure consistency, not comprehension. These improvements in correctness and developer preference show that ComCat increases code comprehensibility, a factor known to boost productivity and lower maintenance costs [Nguyen(2010)].

Our contributions include: (1) a developer-validated taxonomy and annotated dataset of C/C++ comments, (2) a catalog of LLM prompt templates for targeted comment generation, (3) the ComCat tool for automated expertise-guided commenting, and (4) empirical evidence of its effectiveness in improving code comprehension and developer satisfaction. We release our dataset and implementation (https://osf.io/tf2eu/?view_only=4dcf8efa50a346dc96d28b139b0d3b90) to support future research on AI-assisted software documentation.

Skyler Grandel

Vanderbilt University

United States

Scott Andersen

National Autonomous University of Mexico

Yu Huang

Vanderbilt University

United States

Kevin Leach

Vanderbilt University

United States

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 19 Nov
Displayed time zone: Seoul change

11:00 - 12:30	Maintenance & Evolution 2Research Papers / Journal-First Track at Grand Hall 3

11:00 10m Talk		Automated Inline Comment Smell Detection and Repair with Large Language Models Research Papers Hatice Kübra Çağlar Bilkent University, Semih Çağlar Bilkent University, Eray Tüzün Bilkent University Pre-print
11:10 10m Talk		What’s DAT Smell? Untangling and Weaving the Disjoint Assertion Tangle Test Smell Research Papers Monil Narang University of California, Irvine, Hang Du University of California at Irvine, James Jones University of California at Irvine Pre-print
11:20 10m Talk		Your Build Scripts Stink: The State of Code Smells in Build Scripts Research Papers Mahzabin Tamanna North Carolina State University, Yash Chandrani North Carolina State University, Matthew Burrows North Carolina State University, Brandon Wroblewski North Carolina State University, Dominik Wermke North Carolina State University, Laurie Williams North Carolina State University
11:30 10m Talk		Do Experts Agree About Smelly Infrastructure? Journal-First Track Sogol Masoumzadeh Mcgill University, Nuno Saavedra INESC-ID and IST, University of Lisbon, Rungroj Maipradit University of Waterloo, Lili Wei McGill University, João F. Ferreira INESC-ID and IST, University of Lisbon, Daniel Varro Linköping University / McGill University, Shane McIntosh University of Waterloo
11:40 10m Talk		Wired for Reuse: Automating Context-Aware Code Adaptation in IDEs via LLM-Based Agent Research Papers Taiming Wang Beijing Institute of Technology, Yanjie Jiang Peking University, Chunhao Dong Beijing Institute of Technology, Yuxia Zhang Beijing Institute of Technology, Hui Liu Beijing Institute of Technology
11:50 10m Talk		BinStruct: Binary Structure Recovery Combining Static Analysis and Semantics Research Papers Yiran Zhang , Zhengzi Xu Imperial Global Singapore, Zhe Lang Institute of Information Engineering, CAS, CHENGYUE LIU , Yuqiang Sun Nanyang Technological University, Wenbo Guo School of Cyber Science and Engineering, Sichuan University, Chengwei Liu Nanyang Technological University, Weisong Sun Nanyang Technological University, Yang Liu Nanyang Technological University
12:00 10m Talk		SateLight: A Satellite Application Update Framework for Satellite Computing Research Papers Jinfeng Wen Beijing University of Posts and Telecommunications, Jianshu Zhao Beijing University of Posts and Telecommunications, Zixi Zhu Beijing University of Posts and Telecommunications, Xiaomin Zhang Beijing University of Posts and Telecommunications, Qi Liang Beijing University of Posts and Telecommunications, Ao Zhou Beijing University of Posts and Telecommunications, Shangguang Wang Beijing University of Posts and Telecommunications
12:10 10m Talk		ComCat: Expertise-Guided Context Generation to Enhance Code Comprehension Journal-First Track Skyler Grandel Vanderbilt University, Scott Andersen National Autonomous University of Mexico, Yu Huang Vanderbilt University, Kevin Leach Vanderbilt University
12:20 10m Talk		AdaptEval: A Benchmark for Evaluating Large Language Models on Code Snippet Adaptation Research Papers Tanghaoran Zhang National University of Defense Technology, Xinjun Mao National University of Defense Technology, Shangwen Wang National University of Defense Technology, Yuxin Zhao Key Laboratory of Software Engineering for Complex Systems, National University of Defense Technology, Yao Lu National University of Defense Technology, Jin Zhang Hunan Normal University, Zhang Zhang Key Laboratory of Software Engineering for Complex Systems, National University of Defense Technology, Kang Yang National University of Defense Technology, Yue Yu PengCheng Lab