SPT-Code: Sequence-to-Sequence Pre-Training for Learning Representation of Source Code
Thu 12 May 2022 20:25 - 20:30 at ICSE room 2-even hours - Program Comprehension 4 Chair(s): Fabio Petrillo
Recent years have seen the successful application of large pre-trained models to code representation learning, resulting in substantial improvements on many code-related downstream tasks. But there are issues surrounding their application to SE tasks. First, the majority of the pre-trained models focus on pre-training only the encoder of the Transformer. For generation tasks that are addressed using models with the encoder-decoder architecture, however, there is no reason why the decoder should be left out during pre-training. Second, many existing pre-trained models, including state-of-the-art models such as T5-learning, simply reuse the pre-training tasks designed for natural languages. None of these pre-training tasks is designed to facilitate the acquisition of the syntactic structural information of source code. Moreover, to learn the natural language description of source code needed eventually for generation tasks such as code summarization, existing pre-training tasks require a bilingual corpus composed of source code and the associated natural language description, which severely limits the amount of data for pre-training. To this end, we propose SPT-Code, a sequence-to-sequence pre-trained model for source code. In order to pre-train SPT-Code in a sequence-to-sequence manner and address the aforementioned weaknesses associated with existing pre-training tasks, we introduce three pre-training tasks that are specifically designed to enable SPT-Code to learn knowledge of source code, the corresponding code structure, as well as a natural language description of the code without relying on any bilingual corpus, and eventually exploit these three sources of information when it is applied to downstream tasks. Experimental results demonstrate that SPT-Code achieves state-of-the-art performance on five code-related downstream tasks after fine-tuning.
Thu 12 MayDisplayed time zone: Eastern Time (US & Canada) change
03:00 - 04:00 | Program Comprehension 2Technical Track / Journal-First Papers at ICSE room 1-odd hours Chair(s): Prajish Prasad IIT Bombay | ||
03:00 5mTalk | Journal First Submission of the Article: What do class comments tell us? An investigation of comment evolution and practices in Pharo Smalltalk Journal-First Papers Pooja Rani University of bern, Sebastiano Panichella Zurich University of Applied Sciences, Manuel Leuenberger Software Composition Group, University of Bern, Switzerland, Mohammad Ghafari School of Computer Science, University of Auckland, Oscar Nierstrasz University of Bern, Switzerland Link to publication DOI Authorizer link Media Attached | ||
03:05 5mTalk | An Ensemble Approach for Annotating Source Code Identifiers with Part-of-speech Tags Journal-First Papers Christian D. Newman Rochester Institute of Technology, Michael J. Decker Bowling Green State University, Reem S. Alsuhaibani Kent State University, Anthony Peruma Rochester Institute of Technology, Mohamed Wiem Mkaouer Rochester Institute of Technology, Satyajit Mohapatra Rochester Institute of Technology, Tejal Vishnoi Rochester Institute of Technology, Marcos Zampieri Rochester Institute of Technology, Timothy Sheldon BNY Mellon, Emily Hill Drew University Link to publication DOI Pre-print Media Attached | ||
03:10 5mTalk | Why My Code Summarization Approach Does Not Work: Improving Code Summarization with Comment Category Prediction Journal-First Papers Qiuyuan Chen Zhejiang University, Xin Xia Huawei Software Engineering Application Technology Lab, Han Hu Faculty of Information Technology, Monash University, David Lo Singapore Management University, Shanping Li Zhejiang University Pre-print Media Attached | ||
03:15 5mTalk | AST-Trans: Code Summarization with Efficient Tree-Structured Attention Technical Track Ze Tang Software Institute, Nanjing University, Xiaoyu Shen Alexa AI, Amazon, Chuanyi Li State Key Laboratory for Novel Software Technology, Nanjing University, Jidong Ge State Key Laboratory for Novel Software and Technology, Nanjing University, Liguo Huang Dept. of Computer Science, Southern Methodist University, Dallas, TX, 75205, Zheling Zhu State Key Laboratory for Novel Software and Technology, Nanjing University, 22 Hankou Road, Nanjing, China, Bin Luo Software Institute, Nanjing University Pre-print Media Attached | ||
03:20 5mTalk | SPT-Code: Sequence-to-Sequence Pre-Training for Learning Representation of Source Code Technical Track Changan Niu State Key Laboratory for Novel Software Technology, Nanjing University, Chuanyi Li State Key Laboratory for Novel Software Technology, Nanjing University, Vincent Ng Human Language Technology Research Institute, University of Texas at Dallas, Richardson, TX 75083-0688, Jidong Ge State Key Laboratory for Novel Software and Technology, Nanjing University, Liguo Huang Dept. of Computer Science, Southern Methodist University, Dallas, TX, 75205, Bin Luo Software Institute, Nanjing University Pre-print Media Attached |
20:00 - 21:00 | Program Comprehension 4Technical Track / SEET - Software Engineering Education and Training / Journal-First Papers at ICSE room 2-even hours Chair(s): Fabio Petrillo École de technologie supérieure (ÉTS), Montréal -- Université du Québec | ||
20:00 5mTalk | An Ensemble Approach for Annotating Source Code Identifiers with Part-of-speech Tags Journal-First Papers Christian D. Newman Rochester Institute of Technology, Michael J. Decker Bowling Green State University, Reem S. Alsuhaibani Kent State University, Anthony Peruma Rochester Institute of Technology, Mohamed Wiem Mkaouer Rochester Institute of Technology, Satyajit Mohapatra Rochester Institute of Technology, Tejal Vishnoi Rochester Institute of Technology, Marcos Zampieri Rochester Institute of Technology, Timothy Sheldon BNY Mellon, Emily Hill Drew University Link to publication DOI Pre-print Media Attached | ||
20:05 5mTalk | Why My Code Summarization Approach Does Not Work: Improving Code Summarization with Comment Category Prediction Journal-First Papers Qiuyuan Chen Zhejiang University, Xin Xia Huawei Software Engineering Application Technology Lab, Han Hu Faculty of Information Technology, Monash University, David Lo Singapore Management University, Shanping Li Zhejiang University Pre-print Media Attached | ||
20:10 5mTalk | Reading to Write Code: An Experience Report of a Reverse Engineering and Modeling Course SEET - Software Engineering Education and Training Brooke Kelsey Ryan University of California, Irvine, Adriana Meza Soria UC Irvine, Kaj Dreef University of California, Irvine, Andre van der Hoek University of California, Irvine DOI Pre-print Media Attached | ||
20:15 5mTalk | Pausing While Programming: Insights From Keystroke Analysis SEET - Software Engineering Education and Training Raj Shrestha Utah State University, Juho Leinonen Aalto University, Albina Zavgorodniaia Aalto University, Arto Hellas University of Helsinki;Finland, John Edwards Utah State University Pre-print Media Attached | ||
20:20 5mTalk | AST-Trans: Code Summarization with Efficient Tree-Structured Attention Technical Track Ze Tang Software Institute, Nanjing University, Xiaoyu Shen Alexa AI, Amazon, Chuanyi Li State Key Laboratory for Novel Software Technology, Nanjing University, Jidong Ge State Key Laboratory for Novel Software and Technology, Nanjing University, Liguo Huang Dept. of Computer Science, Southern Methodist University, Dallas, TX, 75205, Zheling Zhu State Key Laboratory for Novel Software and Technology, Nanjing University, 22 Hankou Road, Nanjing, China, Bin Luo Software Institute, Nanjing University Pre-print Media Attached | ||
20:25 5mTalk | SPT-Code: Sequence-to-Sequence Pre-Training for Learning Representation of Source Code Technical Track Changan Niu State Key Laboratory for Novel Software Technology, Nanjing University, Chuanyi Li State Key Laboratory for Novel Software Technology, Nanjing University, Vincent Ng Human Language Technology Research Institute, University of Texas at Dallas, Richardson, TX 75083-0688, Jidong Ge State Key Laboratory for Novel Software and Technology, Nanjing University, Liguo Huang Dept. of Computer Science, Southern Methodist University, Dallas, TX, 75205, Bin Luo Software Institute, Nanjing University Pre-print Media Attached | ||
20:30 5mTalk | Demystifying the Vulnerability Propagation and Its Evolution via Dependency Trees in the NPM Ecosystem Technical Track Chengwei Liu Tianjin University and Nanyang Technological University, Sen Chen Tianjin University, Lingling Fan Nankai University, Bihuan Chen Fudan University, China, Yang Liu Nanyang Technological University, Xin Peng Fudan University Pre-print Media Attached |