ASE 2025
Sun 16 - Thu 20 November 2025 Seoul, South Korea

Software maintenance constitutes a substantial portion of a software system’s lifetime cost, with comprehension of existing code accounting for roughly half of these expenses [Nguyen(2010)]. While inline comments and documentation can substantially ease code reading [Stapleton et al.(2020)], [Tenny(1988)], current automated approaches often produce unstructured, irrelevant, or overly verbose annotations for code snippets that must be manually selected. Such manual selection not only incurs additional developer effort but also risks overlooking critical code regions, leading to coverage gaps in documentation. Moreover, the vast majority of prior work evaluates comment quality using string-similarity metrics such as BLEU, despite evidence that BLEU correlates poorly with actual developer comprehension and task performance [Stapleton et al.(2020)]. This disconnect leaves open the question of whether automatically generated comments truly benefit maintainers in practice. To address these gaps, we introduce ComCat, an expertise-guided context generation pipeline that leverages developer input and LLM prompting to produce concise, accurate, and contextually appropriate comments in C/C++ code.

ComCat operates in five fully automated stages: (1) an LLVM-based code parser that segments input C/C++ files into logical \emph{Snippets} by leveraging Clang’s AST to identify functions, loops, and branches; (2) a fine-tuned Code Classifier based on CodeBERT that predicts the most helpful comment type for each Snippet, achieving a 96% classification f1-score on held-out examples; (3) a template catalog mapping each comment class to a prompt structure; (4) a prompt generator that composes prompts by combining snippet-level code context with file-level summaries and selected template slots; and (5) ChatGPT-driven comment synthesis followed by automated AST-based insertion into the source. Taken as a whole, ComCat takes C/C++ source code as input and produces thoroughly commented equivalent source code as output, complete with inline comments and function-level summaries. This fully automated pipeline leverages human-subject research to inform its design, ensuring relevance and clarity of generated comments.

We validate ComCat through human-subject research in three studies. First, in HSR1 (n=24), we derive a 12-category schema and corresponding dataset of classified code-comment pairs, identifying the five comment types most useful for comprehension according to expert participants (function, variable, snippet functionality, branch, reasoning). Using this schema, we train the Code Classifier and construct developer-guided templates. Next, in HSR2 (n=54), participants complete short-answer, code writing, and debugging tasks given code that is commented in different ways, and we measure their performance. HSR2 demonstrates that ComCat-generated comments increase overall correctness by 13.3% over human-written comments and 16.3% over naively prompted ChatGPT (improving from 71.4% and 68.4% to 84.7% correctness respectively; $p<0.001$). Finally, in HSR3 (n=32), we find that developers prefer ComCat comments to human and standard ChatGPT outputs in 66%–82% of trials ($p<0.001$), indicating strong subjective endorsement. Beyond correctness and preference, ComCat improves comment consistency over standard ChatGPT by ~70% (BLEU 0.334 vs.\ 0.197) and yields perfect run-to-run code stability (BLEU=1.000). Here we use BLEU only to measure consistency, not comprehension. These improvements in correctness and developer preference show that ComCat increases code comprehensibility, a factor known to boost productivity and lower maintenance costs [Nguyen(2010)].

Our contributions include: (1) a developer-validated taxonomy and annotated dataset of C/C++ comments, (2) a catalog of LLM prompt templates for targeted comment generation, (3) the ComCat tool for automated expertise-guided commenting, and (4) empirical evidence of its effectiveness in improving code comprehension and developer satisfaction. We release our dataset and implementation (https://osf.io/tf2eu/?view_only=4dcf8efa50a346dc96d28b139b0d3b90) to support future research on AI-assisted software documentation.