Write a Blog >>
ICPC 2022
Mon 16 - Tue 17 May 2022
co-located with ICSE 2022
Mon 16 May 2022 21:28 - 21:35 at ICPC room - Session 9: Program Representation 2 Chair(s): Lingxiao Jiang

Pre-trained Language Models (PLM) such as CodeBERT and GraphCodeBERT, when trained on a large corpus of code, have recently displayed promising results in Software Engineering (SE) downstream tasks. A PLM is most useful if it can be leveraged to improve the performance on code corpora written in low-resource programming languages, where training data is limited. In this work, our focus is on studying the impact of PLMs on a low-resource programming language corpus — specifically, we choose Ruby as the study subject.

A recent study by Ahmed and Devanbu reported that using a corpus of code written in multilingual datasets to fine-tune multilingual PLMs achieves higher performance as opposed to using a corpus of code written in just one programming language. However, no analysis was made with respect to monolingual PLMs. Furthermore, some programming languages are inherently different and code written in one language usually cannot be interchanged with the others, i.e., Ruby and Java code possess very different structure. To better understand how monolingual and multilingual PLM affects different programming languages, we investigate 1) the performance of PLMs on Ruby for two popular SE tasks: Code Summarization and Code Search, 2) the strategy (to select programming languages) that works well on fine-tuning multilingual PLMs for Ruby, and 3) the performance of the fine-tuned PLMs on Ruby given different code lengths — here, we bin the Ruby code based on its number of tokens; understanding the performance on different code lengths will enable developers to make more informed decision on the use of PLMs based on their code.

In this work, we analyze over a hundred of trained and fine-tuned models, and our results show that 1) multilingual PLMs have a higher time-to-performance ratio (the duration of fine-tuning over BLEU, METEOR, or MRR scores) as compared to monolingual PLMs, 2) our proposed strategy to select target programming languages to fine-tune multilingual PLMs is effective — it not only reduces the time to fine-tune but also achieves higher performance in Code Summarization and Code Search tasks, and 3) our proposed strategy consistently shows good performance on different code lengths.

Mon 16 May

Displayed time zone: Eastern Time (US & Canada) change

21:00 - 21:50
Session 9: Program Representation 2Research at ICPC room
Chair(s): Lingxiao Jiang Singapore Management University
21:00
7m
Talk
HELoC: Hierarchical Contrastive Learning of Source Code Representation
Research
Xiao Wang Shandong Normal University, Qiong Wu Shandong Normal University, Hongyu Zhang University of Newcastle, Chen Lyu Shandong Normal University, Xue Jiang Shandong Normal University, Zhuoran Zheng Nanjing University of Science and Technology, Lei Lyu Shandong Normal University, Songlin Hu Institute of Information Engineering, Chinese Academy of Sciences
Media Attached
21:07
7m
Talk
Exploring GNN Based Program Embedding Technologies for Binary related Tasks
Research
YixinGuo Peking University, Pengcheng Li Google, Inc, Yingwei Luo Peking University, Xiaolin Wang Peking University, Zhenlin Wang Michigan Technological University
Media Attached
21:14
7m
Talk
Learning Heterogeneous Type Information in Program Graphs
Research
Kechi Zhang Peking University, Wenhan Wang Nanyang Technological University, Huangzhao Zhang Peking University, Ge Li Peking University, Zhi Jin Peking University
DOI Pre-print Media Attached
21:21
7m
Talk
Unified Abstract Syntax Tree Representation Learning for Cross-language Program Classification
Research
Kesu Wang Nanjing University, Meng Yan Chongqing University, He Zhang Nanjing University, Haibo Hu Chongqing University
Media Attached
21:28
7m
Talk
On the Transferability of Pre-trained Language Models for Low-Resource Programming Languages
Research
Fuxiang Chen University of British Columbia, Fatemeh Hendijani Fard University of British Columbia, David Lo Singapore Management University, Timofey Bryksin JetBrains Research; HSE University
Pre-print Media Attached
21:35
15m
Live Q&A
Q&A-Paper Session 9
Research


Information for Participants
Mon 16 May 2022 21:00 - 21:50 at ICPC room - Session 9: Program Representation 2 Chair(s): Lingxiao Jiang
Info for room ICPC room:

Click here to go to the room on Midspace