DialogAgent: An Auto-engagement Agent for Code Question Answering Data Production (ICSE 2025 - Software Engineering in Practice (SEIP))

Who

Xiaoyun Liang, Jingyi Ren, Jiayi Qi, Chao Peng, Bo Jiang

Track

ICSE 2025 SE In Practice (SEIP)

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 30 Apr 2025 12:15 - 12:30 at 214 - AI for Testing and QA 1 Chair(s): Jieshan Chen

Abstract

Large Language Models (LLMs) have become increasingly integral to enhancing developer productivity, particularly in code generation, comprehension, and repair tasks. However, fine-tuning these models with high-quality, real-world data is challenging due to privacy concerns and the lack of accessible, labeled datasets. In this paper, we present DialogAgent, an automated tool for generating synthetic training data that closely mimics real developer interactions within Integrated Development Environments (IDEs). DialogAgent enables the production of diverse, high-fidelity query-response pairs by simulating multi-turn dialogues and contextual behaviors observed in real-world programming scenarios. The tool significantly reduces the reliance on manual data generation, increasing efficiency by 4.8 times compared to traditional methods. Our experiments and online deployment demonstrate substantial improvements in model performance for code-related question-answering tasks: the acceptance rate of responses generated by our in-house model is improved by 33%, after training on synthesized data generated by DialogAgent.

Xiaoyun Liang

ByteDance

China

Jingyi Ren

ByteDance

Jiayi Qi

ByteDance

China

Chao Peng

ByteDance

China

Bo Jiang

Bytedance Network Technology

China

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 30 Apr
Displayed time zone: Eastern Time (US & Canada) change

11:00 - 12:30	AI for Testing and QA 1Research Track / SE In Practice (SEIP) at 214 Chair(s): Jieshan Chen CSIRO's Data61

11:00 15m Talk		Does GenAI Make Usability Testing Obsolete?Award Winner Research Track Ali Ebrahimi Pourasad , Walid Maalej University of Hamburg Pre-print
11:15 15m Talk		Feature-Driven End-To-End Test Generation Research Track Parsa Alian University of British Columbia, Noor Nashid University of British Columbia, Mobina Shahbandeh University of British Columbia, Taha Shabani University of British Columbia, Ali Mesbah University of British Columbia
11:30 15m Talk		SeeAction: Towards Reverse Engineering How-What-Where of HCI Actions from Screencasts for UI AutomationAward Winner Research Track Dehai Zhao CSIRO's Data61, Zhenchang Xing CSIRO's Data61, Qinghua Lu Data61, CSIRO, Xiwei (Sherry) Xu Data61, CSIRO, Liming Zhu CSIRO’s Data61
11:45 15m Talk		Synthesizing Document Database Queries using Collection Abstractions Research Track Qikang Liu Simon Fraser University, Yang He Simon Fraser University, Yanwen Cai Simon Fraser University, Byeongguk Kwak Simon Fraser University, Yuepeng Wang Simon Fraser University
12:00 15m Talk		The Power of Types: Exploring the Impact of Type Checking on Neural Bug Detection in Dynamically Typed Languages Research Track Boqi Chen McGill University, José Antonio Hernández López Linköping University, Gunter Mussbacher McGill University, Daniel Varro Linköping University / McGill University
12:15 15m Talk		DialogAgent: An Auto-engagement Agent for Code Question Answering Data Production SE In Practice (SEIP) Xiaoyun Liang ByteDance, Jingyi Ren ByteDance, Jiayi Qi ByteDance, Chao Peng ByteDance, Bo Jiang Bytedance Network Technology