Doc2OracLL: Investigating the Impact of Documentation on LLM-based Test Oracle Generation (FSE 2025 - Research Papers)

Mon 23 - Fri 27 June 2025 Trondheim, Norway

co-located with ISSTA 2025

Who

Soneya Binta Hossain, Raygan Taylor, Matthew B Dwyer

Track

FSE 2025 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 23 Jun 2025 11:00 - 11:20 at Cosmos Hall - Test Generation Chair(s): Michael Pradel

Abstract

Code documentation is a critical aspect of software development, serving as a bridge between human understanding and machine-readable code. Beyond assisting developers in understanding and maintaining code, documentation also plays a critical role in automating various software engineering tasks, such as test oracle generation (TOG). In Java, Javadoc comments provide structured, natural language documentation embedded directly in the source code, typically detailing functionality, usage, parameters, return values, and exceptions. While prior research has utilized Javadoc comments in test oracle generation (TOG), there has not been a thorough investigation into their impact when combined with other contextual information, nor into identifying the most relevant components for generating correct and strong test oracles, or understanding their role in detecting real bugs. In this study, we dive deep into investigating the impact of Javadoc comments on TOG. We start by fine-tune 10 large language models with three different prompt pairs designed to investigate the impact of Javadoc comments when using with other contextual information. We conduct a systematic analysis to assess the impact of different Javadoc components on TOG. For investigating the generalizability of the Javadoc comments from various sources, we also generate Javadoc comments using GPT-3.5 model. Finally, we perform a thorough bug detection study using Defects4J to understand the role of Javadoc comments in real-world bug detection. Our results show that, in most cases, incorporating Javadoc comments improves the accuracy of test oracles, aligning closely with ground truth. We found that Javadoc comments alone can nearly match the performance achieved when using both Javadoc comments and MUT code. We find that the description and the return tags of the Javadoc comments are most valuable in TOG. Finally, when using just Javadoc comments our method detects between 19% and 94% more real-world bugs in Defects4J than prior methods.

DOI

https://doi.org/10.1145/3729354

Soneya Binta Hossain

University of Virginia

United States

Raygan Taylor

Dillard University

Matthew B Dwyer

University of Virginia

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 23 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:30 - 12:30	Test GenerationResearch Papers / Industry Papers at Cosmos Hall Chair(s): Michael Pradel University of Stuttgart

10:30 20m Talk		CoverUp: Effective High Coverage Test Generation for Python Research Papers Juan Altmayer Pizzorno University of Massachusetts Amherst, Emery D. Berger University of Massachusetts Amherst and Amazon Web Services DOI Pre-print
11:00 20m Talk		Doc2OracLL: Investigating the Impact of Documentation on LLM-based Test Oracle Generation Research Papers Soneya Binta Hossain University of Virginia, Raygan Taylor Dillard University, Matthew B Dwyer University of Virginia DOI
11:20 20m Talk		Less is More: On the Importance of Data Quality for Unit Test Generation Research Papers Junwei Zhang Zhejiang University, Xing Hu Zhejiang University, Shan Gao Huawei, Xin Xia Zhejiang University, David Lo Singapore Management University, Shanping Li Zhejiang University DOI
11:40 20m Talk		Mutation-Guided LLM-based Test Generation at Meta Industry Papers Mark Harman Meta Platforms, Inc. and UCL, Jillian Ritchey Meta platforms, Inna Harper Meta, Shubho Sengupta Meta platforms, Ke Mao Meta, Abhishek Gulati Meta platforms, Christopher Foster Meta platforms, Hervé Robert Meta platforms
12:00 10m Talk		LSPAI: An IDE Plugin for LLM-Powered Multi-Language Unit Test Generation with Language Server Protocol Industry Papers Gwihwan Go Tsinghua University, Chijin Zhou Tsinghua University, Quan Zhang Tsinghua University, Yu Jiang Tsinghua University, Zhao Wei Tencent
12:10 20m Talk		Can Generative AI Produce Test Cases? An Experience from the Automotive Domain Industry Papers Stephen Wynn-Williams McMaster University, Canada, Ryan Tyrrell McMaster University, Vera Pantelic McMaster University, Mark Lawford McMaster University, Claudio Menghi University of Bergamo; McMaster University, Phaneendra Nalla FCA US LLC, Hassan Artail FCA US LLC

Information for Participants

Mon 23 Jun 2025 10:30 - 12:30 at Cosmos Hall - Test Generation Chair(s): Michael Pradel

Info for room Cosmos Hall:

This is the main event hall of Clarion Hotel, which will be used to host keynote talks and other plenary sessions. The FSE and ISSTA banquets will also happen in this room.

The room is just in front of the registration desk, on the other side of the main conference area. The large doors with numbers “1” and “2” provide access to the Cosmos Hall.