Code Question Answering via Task-Adaptive Sequence-to-Sequence Pre-training (APSEC 2022 - Technical Track)

Who

Tingrui Yu, Beijun Shen, Xiaodong Gu

Track

APSEC 2022 Technical Track

Time Zone

The program is currently displayed in (GMT+09:00) Osaka, Sapporo, Tokyo.

Use conference time zone: (GMT+09:00) Osaka, Sapporo, TokyoSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 7 Dec 2022 13:20 - 13:40 at Room2 - Machine Learning 1 Chair(s): Syful Islam

Abstract

The development of a question answering (QA) system for code can greatly facilitate programs understanding for developers. Recently, pre-trained language models (PLMs) have shown promising results in the code QA task. However, directly applying PLMs to code QA often causes suboptimal performance due to the large discrepancy between pre-training and the downstream QA tasks. While code PLMs are pre-trained on large-scale unlabeled code corpora, there is often a scarce availability of annotated QA pairs for fine-tuning. Existing code PLMs simply reuse the code representation part and require to train the QA part from scratch, which causes the model to overfit QA data. In this paper, we propose CodeMaster, a novel pre-training based approach for automatically answering code questions via task adaptation. CodeMaster employs CodeT5, a popular PLM for source code. In order to mitigate the gap between pre-training and QA, CodeMaster continually pre-trains CodeT5 on multiple self-supervised learning tasks such as partial comment completion and noun-phrase prediction. Experimental results on the CodeQA benchmark show that CodeMaster achieves state-of-the-art performance, and highlight the effectiveness of our approach.

Tingrui Yu

School of Software, Shanghai Jiao Tong University

China

Beijun Shen

School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University

China

Xiaodong Gu

Shanghai Jiao Tong University

China

Time Zone

The program is currently displayed in (GMT+09:00) Osaka, Sapporo, Tokyo.

Use conference time zone: (GMT+09:00) Osaka, Sapporo, TokyoSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 7 Dec
Displayed time zone: Osaka, Sapporo, Tokyo change

13:00 - 14:00	Machine Learning 1Technical Track at Room2 Chair(s): Syful Islam Nara Institute of Science and Technology

13:00 20m Paper		Catch Me If You Can: Blackbox Adversarial Attacks on Automatic Speech Recognition using Frequency Masking Technical Track Xiaoliang Wu University of Edinburgh, Ajitha Rajan University of Edinburgh
13:20 20m Paper		Code Question Answering via Task-Adaptive Sequence-to-Sequence Pre-training Technical Track Tingrui Yu School of Software, Shanghai Jiao Tong University, Beijun Shen School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Xiaodong Gu Shanghai Jiao Tong University