Enhancing Differential Testing: LLM-Powered Automation in Release Engineering (ICSE 2025 - Software Engineering in Practice (SEIP))

Who

Ajay Krishna Vajjala, Arun Krishna Vajjala, Carmen Badea, Christian Bird, Robert DeLine, Jason Entenmann, Nicole Forsgren, Aliaksandr Hramadski, Sandeepan Sanyal, Oleg Surmachev, Thomas Zimmermann, Haris Mohammad, Jade D'Souza, Mikhail Demyanyuk

Track

ICSE 2025 SE In Practice (SEIP)

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 2 May 2025 16:45 - 17:00 at 212 - AI for Process Chair(s): Keheliya Gallaba

Abstract

In modern software engineering, efficient release engineering workflows are essential for quickly delivering new features to production. This not only improves company productivity but also provides customers with frequent updates, which can lead to increased profits. At Microsoft, we collaborated with the Identity and Network Access (IDNA) team to automate their release engineering workflows. They use differential testing to classify differences between test and production environments, which helps them assess how new changes perform with real-world traffic before pushing updates to production. This process enhances resiliency and ensures robust changes to the system. However, on-call engineers (OCEs) must manually label hundreds or thousands of behavior differences, which is time-consuming. In this work, we present a method leveraging Large Language Models (LLMs) to automate the classification of these differences, which saves OCEs a significant amount time. Our experiments demonstrate that LLMs are effective classifiers for automating the task of behavior difference classification, which can lead to speeding up release workflows, and improved OCE productivity.

Ajay Krishna Vajjala

George Mason University

Arun Krishna Vajjala

George Mason University

United States

Carmen Badea

Microsoft Research

United States

Christian Bird

Microsoft Research

United States

Robert DeLine

Microsoft Research

Jason Entenmann

Microsoft Research

Nicole Forsgren

Microsoft Research

United States

Aliaksandr Hramadski

Microsoft

Sandeepan Sanyal

Microsoft

Oleg Surmachev

Microsoft

Thomas Zimmermann

University of California, Irvine

United States

Haris Mohammad

Microsoft

Jade D'Souza

Microsoft

Mikhail Demyanyuk

Microsoft

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 2 May
Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30	AI for ProcessSE In Practice (SEIP) / Demonstrations / New Ideas and Emerging Results (NIER) at 212 Chair(s): Keheliya Gallaba Centre for Software Excellence, Huawei Canada

16:00 15m Talk		OptCD: Optimizing Continuous Development Demonstrations Talank Baral George Mason University, Emirhan Oğul Middle East Technical University, Shanto Rahman The University of Texas at Austin, August Shi The University of Texas at Austin, Wing Lam George Mason University
16:15 15m Talk		LLMs as Evaluators: A Novel Approach to Commit Message Quality Assessment New Ideas and Emerging Results (NIER) Abhishek Kumar Indian Institute of Technology, Kharagpur, Sandhya Sankar Indian Institute of Technology, Kharagpur, Sonia Haiduc Florida State University, Partha Pratim Das Indian Institute of Technology, Kharagpur, Partha Pratim Chakrabarti Indian Institute of Technology, Kharagpur
16:30 15m Talk		Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings SE In Practice (SEIP) Petr Tsvetkov JetBrains Research, Aleksandra Eliseeva JetBrains Research, Danny Dig University of Colorado Boulder, JetBrains Research, Alexander Bezzubov JetBrains, Yaroslav Golubev JetBrains Research, Timofey Bryksin JetBrains Research, Yaroslav Zharov JetBrains Research Pre-print
16:45 15m Talk		Enhancing Differential Testing: LLM-Powered Automation in Release Engineering SE In Practice (SEIP) Ajay Krishna Vajjala George Mason University, Arun Krishna Vajjala George Mason University, Carmen Badea Microsoft Research, Christian Bird Microsoft Research, Robert DeLine Microsoft Research, Jason Entenmann Microsoft Research, Nicole Forsgren Microsoft Research, Aliaksandr Hramadski Microsoft, Sandeepan Sanyal Microsoft, Oleg Surmachev Microsoft, Thomas Zimmermann University of California, Irvine, Haris Mohammad Microsoft, Jade D'Souza Microsoft, Mikhail Demyanyuk Microsoft
17:00 15m Talk		How much does AI impact development speed? An enterprise-based randomized controlled trial SE In Practice (SEIP) Elise Paradis Google, Inc, Kate Grey Google, Quinn Madison Google, Daye Nam Google, Andrew Macvean Google, Inc., Nan Zhang Google, Ben Ferrari-Church Google, Satish Chandra Google, Inc
17:15 15m Talk		Using Reinforcement Learning to Sustain the Performance of Version Control Repositories New Ideas and Emerging Results (NIER) Shane McIntosh University of Waterloo, Luca Milanesio GerritForge Inc., Antonio Barone GerritForge Inc., Jacek Centkowski GerritForge Inc., Marcin Czech GerritForge Inc., Fabio Ponciroli GerritForge Inc. Pre-print