Code Question Answering via Task-Adaptive Sequence-to-Sequence Pre-training
The development of a question answering (QA) system for code can greatly facilitate programs understanding for developers. Recently, pre-trained language models (PLMs) have shown promising results in the code QA task. However, directly applying PLMs to code QA often causes suboptimal performance due to the large discrepancy between pre-training and the downstream QA tasks. While code PLMs are pre-trained on large-scale unlabeled code corpora, there is often a scarce availability of annotated QA pairs for fine-tuning. Existing code PLMs simply reuse the code representation part and require to train the QA part from scratch, which causes the model to overfit QA data. In this paper, we propose CodeMaster, a novel pre-training based approach for automatically answering code questions via task adaptation. CodeMaster employs CodeT5, a popular PLM for source code. In order to mitigate the gap between pre-training and QA, CodeMaster continually pre-trains CodeT5 on multiple self-supervised learning tasks such as partial comment completion and noun-phrase prediction. Experimental results on the CodeQA benchmark show that CodeMaster achieves state-of-the-art performance, and highlight the effectiveness of our approach.
Wed 7 DecDisplayed time zone: Osaka, Sapporo, Tokyo change
13:00 - 14:00 | Machine Learning 1Technical Track at Room2 Chair(s): Syful Islam Nara Institute of Science and Technology | ||
13:00 20mPaper | Catch Me If You Can: Blackbox Adversarial Attacks on Automatic Speech Recognition using Frequency Masking Technical Track | ||
13:20 20mPaper | Code Question Answering via Task-Adaptive Sequence-to-Sequence Pre-training Technical Track Tingrui Yu School of Software, Shanghai Jiao Tong University, Beijun Shen School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Xiaodong Gu Shanghai Jiao Tong University |