Using Large Language Models to Support the Workflow of Differential Testing (FSE 2025 - Industry Papers)

Mon 23 - Fri 27 June 2025 Trondheim, Norway

Who

Arun Krishna Vajjala, Ajay Krishna Vajjala, Carmen Badea, Christian Bird, Jade D'Souza, Robert DeLine, Mikhail Demyanyuk, Jason Entenmann, Nicole Forsgren, Aliaksandr Hramadski, Haris Mohammad, Sandeepan Sanyal, Oleg Surmachev, Thomas Zimmermann

Track

FSE 2025 Industry Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 25 Jun 2025 14:30 - 14:50 at Cosmos 3D - Testing 4 Chair(s): Antonio Mastropaolo

Abstract

Many software development teams use differential testing as a quality gate in their release process. Differential testing—namely, comparing behavioral differences between a system in production and a system in test—is a laborious process to label changes as regressions, expected changes, or incidental changes (e.g. those due to nondeterminism or timing). This manual process involves inspecting large textual artifacts, like logs, pull requests, and team discussions, which suggests that Large Language Models (LLMs) could be helpful. In this paper, we engage with the team developing a central Azure service to understand their work practice for differential testing. We used a design probe method to elicit feedback about several ways to use LLMs to improve their work practice, including automatically labeling behavior differences and providing summaries of various artifacts and discussions. Release engineers on the team report that predicting a difference’s label would save them effort, but they want an explicit rationale to improve their trust in the prediction; they found the generated summaries to be informative, if a bit wordy.

Arun Krishna Vajjala

George Mason University

United States

Ajay Krishna Vajjala

George Mason University

Carmen Badea

Microsoft Research

United States

Christian Bird

Microsoft Research

United States

Jade D'Souza

Microsoft

Robert DeLine

Microsoft Research

Mikhail Demyanyuk

Microsoft

Jason Entenmann

Microsoft Research

Nicole Forsgren

Microsoft Research

United States

Aliaksandr Hramadski

Microsoft

Haris Mohammad

Microsoft

Sandeepan Sanyal

Microsoft

Oleg Surmachev

Microsoft

Thomas Zimmermann

University of California, Irvine

United States

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 25 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

14:00 - 15:20	Testing 4Industry Papers / Research Papers / Demonstrations at Cosmos 3D Chair(s): Antonio Mastropaolo William and Mary, USA

14:00 20m Talk		Detecting and Reducing the Factual Hallucinations of Large Language Models with Metamorphic Testing Research Papers Weibin Wu Sun Yat-sen University, Yuhang Cao Sun Yat-sen University, Ning Yi Sun Yat-sen University, Rongyi Ou Sun Yat-sen University, Zibin Zheng Sun Yat-sen University DOI
14:20 10m Talk		A Tool for Generating Exceptional Behavior Tests With Large Language Models Demonstrations Linghan Zhong University of Texas Austin, Samuel Yuan The University of Texas at Austin, Jiyang Zhang University of Texas at Austin, Yu Liu Meta, Pengyu Nie University of Waterloo, Junyi Jessy Li University of Texas at Austin, USA, Milos Gligoric The University of Texas at Austin
14:30 20m Talk		Using Large Language Models to Support the Workflow of Differential Testing Industry Papers Arun Krishna Vajjala George Mason University, Ajay Krishna Vajjala George Mason University, Carmen Badea Microsoft Research, Christian Bird Microsoft Research, Jade D'Souza Microsoft, Robert DeLine Microsoft Research, Mikhail Demyanyuk Microsoft, Jason Entenmann Microsoft Research, Nicole Forsgren Microsoft Research, Aliaksandr Hramadski Microsoft, Haris Mohammad Microsoft, Sandeepan Sanyal Microsoft, Oleg Surmachev Microsoft, Thomas Zimmermann University of California, Irvine
14:50 20m Talk		Adaptive Random Testing with Qgrams: the Illusion Comes True Research Papers Matteo Biagiola Università della Svizzera italiana, Robert Feldt Chalmers \| University of Gothenburg, Paolo Tonella USI Lugano DOI Pre-print
15:10 10m Talk		Dynamic Application Security Testing for Kubernetes Deployment: An Experience Report from Industry Industry Papers Shazibul Islam Shamim Kennesaw State University, Hanyang Hu Company A, Akond Rahman Auburn University Pre-print

Information for Participants

Wed 25 Jun 2025 14:00 - 15:20 at Cosmos 3D - Testing 4 Chair(s): Antonio Mastropaolo

Info for room Cosmos 3D:

Cosmos 3D is the fourth room in the Cosmos 3 wing.

When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.