UniGenCoder: Merging Seq2Seq and Seq2Tree Paradigms for Unified Code Generation (ICSE 2025 - New Ideas and Emerging Results (NIER)) - ICSE 2025

Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada

Who

Liangying Shao, Yanfu Yan, Denys Poshyvanyk, Jinsong Su

Track

ICSE 2025 New Ideas and Emerging Results (NIER)

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Fri 2 May 2025 11:45 - 12:00 at Canada Hall 1 and 2 - AI for SE 3 Chair(s): Ying Zou

Abstract

Deep learning-based code generation has completely transformed the way developers write programs today. Existing approaches to code generation have focused either on the Sequence-to-Sequence paradigm, which generates target code as a sequence of tokens, or the Sequence-to-Tree paradigm, which outputs code as a sequence of actions. While these two paradigms are intuitively complementary, their combination has not been previously explored. By comparing the code generated under these two paradigms, we find that integrating them holds significant potential. In this paper, we propose UniGenCoder for code-related generation tasks, which consists of a shared encoder, a shared decoder with a minimal set of additional parameters to unify two paradigms, and a selector that dynamically chooses optimal paradigm for each instance. Also, during the model training, we first perform the multi-task learning and distillation strategies to facilitate knowledge transfer between two paradigms, and then leverage contrastive learning to train the selector. Experimental results on the text-to-code and code-to-code generation tasks demonstrate the effectiveness of our proposed model. We will release our code upon acceptance.

Liangying Shao

School of Informatics, Xiamen University, China

Yanfu Yan

William & Mary

Denys Poshyvanyk

William & Mary

United States

Jinsong Su

School of Informatics, Xiamen University, China

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Fri 2 May
Displayed time zone: Eastern Time (US & Canada) change

	11:00 - 12:30	AI for SE 3New Ideas and Emerging Results (NIER) / Journal-first Papers / Research Track / SE In Practice (SEIP) at Canada Hall 1 and 2 Chair(s): Ying Zou Queen's University, Kingston, Ontario

	11:00 15m Talk		A First Look at Conventional Commits Classification Research Track Qunhong Zeng Beijing Institute of Technology, Yuxia Zhang Beijing Institute of Technology, Zhiqing Qiu Beijing Institute of Technology, Hui Liu Beijing Institute of Technology
	11:15 15m Talk		ChatGPT-Based Test Generation for Refactoring Engines Enhanced by Feature Analysis on Examples Research Track Chunhao Dong Beijing Institute of Technology, Yanjie Jiang Peking University, Yuxia Zhang Beijing Institute of Technology, Yang Zhang Hebei University of Science and Technology, Hui Liu Beijing Institute of Technology
	11:30 15m Talk		SECRET: Towards Scalable and Efficient Code Retrieval via Segmented Deep Hashing Research Track Wenchao Gu The Chinese University of Hong Kong, Ensheng Shi Xi’an Jiaotong University, Yanlin Wang Sun Yat-sen University, Lun Du Microsoft Research, Shi Han Microsoft Research, Hongyu Zhang Chongqing University, Dongmei Zhang Microsoft Research, Michael Lyu The Chinese University of Hong Kong
	11:45 15m Talk		UniGenCoder: Merging Seq2Seq and Seq2Tree Paradigms for Unified Code Generation New Ideas and Emerging Results (NIER) Liangying Shao School of Informatics, Xiamen University, China, Yanfu Yan William & Mary, Denys Poshyvanyk William & Mary, Jinsong Su School of Informatics, Xiamen University, China
	12:00 15m Talk		How is Google using AI for internal code migrations? SE In Practice (SEIP) Stoyan Nikolov Google, Inc., Daniele Codecasa Google, Inc., Anna Sjovall Google, Inc., Maxim Tabachnyk Google, Siddharth Taneja Google, Inc., Celal Ziftci Google, Satish Chandra Google, Inc
	12:15 7m Talk		LLM-Based Test-Driven Interactive Code Generation: User Study and Empirical Evaluation Journal-first Papers Sarah Fakhoury Microsoft Research, Aaditya Naik University of Pennsylvania, Georgios Sakkas University of California at San Diego, Saikat Chakraborty Microsoft Research, Shuvendu K. Lahiri Microsoft Research Link to publication
	12:22 7m Talk		The impact of Concept drift and Data leakage on Log Level Prediction Models Journal-first Papers Youssef Esseddiq Ouatiti Queen's university, Mohammed Sayagh ETS Montreal, University of Quebec, Noureddine Kerzazi Ensias-Rabat, Bram Adams Queen's University, Ahmed E. Hassan Queen’s University, Youssef Esseddiq Ouatiti Queen's university