FSE 2025
Mon 23 - Fri 27 June 2025 Trondheim, Norway

Code documentation is a critical aspect of software development, serving as a bridge between human understanding and machine-readable code. Beyond assisting developers in understanding and maintaining code, documentation also plays a critical role in automating various software engineering tasks, such as test oracle generation (TOG). In Java, Javadoc comments provide structured, natural language documentation embedded directly in the source code, typically detailing functionality, usage, parameters, return values, and exceptions. While prior research has utilized Javadoc comments in test oracle generation (TOG), there has not been a thorough investigation into their impact when combined with other contextual information, nor into identifying the most relevant components for generating correct and strong test oracles, or understanding their role in detecting real bugs. In this study, we dive deep into investigating the impact of Javadoc comments on TOG. We start by fine-tune 10 large language models with three different prompt pairs designed to investigate the impact of Javadoc comments when using with other contextual information. We conduct a systematic analysis to assess the impact of different Javadoc components on TOG. For investigating the generalizability of the Javadoc comments from various sources, we also generate Javadoc comments using GPT-3.5 model. Finally, we perform a thorough bug detection study using Defects4J to understand the role of Javadoc comments in real-world bug detection. Our results show that, in most cases, incorporating Javadoc comments improves the accuracy of test oracles, aligning closely with ground truth. We found that Javadoc comments alone can nearly match the performance achieved when using both Javadoc comments and MUT code. We find that the description and the return tags of the Javadoc comments are most valuable in TOG. Finally, when using just Javadoc comments our method detects between 19% and 94% more real-world bugs in Defects4J than prior methods.