FuseApplyBench: Multilingual Benchmark for Trustworthy Code Edit Applying Task (EXPRESS 2025)

Who

Ming Liang, Qingyu Zhang, Zhipeng Zuo, Shaoqiang Zheng, Dajun Chen, Wei Jiang, Yong Li

Track

EXPRESS 2025

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sat 28 Jun 2025 10:10 - 10:30 at Cosmos 3B - Trustworthy AI for Code Chair(s): Peng Di, Puzhuo Liu

Abstract

With the rise of Language Models (LMs) and Large Language Models (LLMs), their potential for code editing (CE) has gained attention. A common approach is to have LLMs generate draft code modifications, which are then refined by smaller LMs in further Code Editing Apply (CEA) task. However, the CEA task is prone to errors, and existing benchmarks do not systematically evaluate LLM performance in handling these issues. To address this, we introduce FuseApplyBench, a benchmark designed to evaluate LLM performance across three major error types in CEA tasks. Atop FuseApplyBench’s pipeline, we collect datasets to perform fine-tuning, enhancing code modifications’ reliability (denoted as FuseApply-7B). We benchmark FuseApply-7B, four widely used open source LLMs, and Kortix-FastApply-7B on FuseApplyBench. Results show that FuseApply-7B significantly improves trustworthiness and accuracy metrics, while other models demonstrate weaker performance, highlighting opportunities for advancing LLM applications in CE.

Ming Liang

Ant Group

Qingyu Zhang

the University of Hong Kong

Zhipeng Zuo

Ant Group

Shaoqiang Zheng

Ant Group

Dajun Chen

Ant Group

Wei Jiang

Ant Group

Yong Li

Ant Group

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sat 28 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

09:00 - 10:30	Trustworthy AI for CodeEXPRESS at Cosmos 3B Chair(s): Peng Di Ant Group & UNSW Sydney, Puzhuo Liu Ant Group & Tsinghua University

09:00 10m Day opening		Opening and Welcome EXPRESS
09:10 60m Keynote		Human-like AI Auditor for Code Repositories EXPRESS Xiangyu Zhang Purdue University
10:10 20m Talk		FuseApplyBench: Multilingual Benchmark for Trustworthy Code Edit Applying Task EXPRESS Ming Liang Ant Group, Qingyu Zhang the University of Hong Kong, Zhipeng Zuo Ant Group, Shaoqiang Zheng Ant Group, Dajun Chen Ant Group, Wei Jiang Ant Group, Yong Li Ant Group

Information for Participants

Sat 28 Jun 2025 09:00 - 10:30 at Cosmos 3B - Trustworthy AI for Code Chair(s): Peng Di, Puzhuo Liu

Info for room Cosmos 3B:

Cosmos 3B is the second room in the Cosmos 3 wing.

When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.