Improving Code Search with Co-Attentive Representation Learning (ICPC 2020 - Research)

Who

Jianhang Shuai, Ling Xu, Chao Liu, Meng Yan, Xin Xia, Yan Lei

Track

ICPC 2020 Research

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 14 Jul 2020 07:15 - 07:30 at ICPC - Session 5: For Researchers Chair(s): Bin Lin

Abstract

Searching and reusing existing code from a large-scale codebase, e.g, GitHub, can help developers complete a programming task efficiently. Recently, Gu et al. proposed a deep learning-based model (i.e., DeepCS), which significantly outperformed prior models. The DeepCS embedded codebase and natural language queries into vectors by two LSTM (long and short-term memory) models separately, and returned developers the code with higher similarity to a code search query. However, such embedding method learned two isolated representations for code and query but ignored their internal semantic correlations. As a result, the learned isolated representations of code and query may limit the effectiveness of code search. To address the aforementioned issue, we propose a co-attentive representation learning model, i.e., Co-Attentive Representation Learning Code Search-CNN (CARLCS-CNN). CARLCS-CNN learns interdependent representations for the embedded code and query with a co-attention mechanism. Generally, such mechanism learns a correlation matrix between embedded code and query, and coattends their semantic relationship via row/column-wise max-pooling. In this way, the semantic correlation between code and query can directly affect their individual representations. We evaluate the effectiveness of CARLCS-CNN on Gu et al.’s dataset with 10k queries. Experimental results show that the proposed CARLCS-CNN model significantly outperforms DeepCS by 26.72% in terms of MRR (mean reciprocal rank). Additionally, CARLCS-CNN is five times faster than DeepCS in model training and four times in testing.

Jianhang Shuai

School of Big Data & Software Engineering, Chongqing University

Ling Xu

School of Big Data & Software Engineering, Chongqing University

Chao Liu

Zhejiang University

Meng Yan

School of Big Data & Software Engineering, Chongqing University

Xin Xia

Monash University

Australia

Yan Lei

School of Big Data & Software Engineering, Chongqing University

Media

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 14 Jul
Displayed time zone: (UTC) Coordinated Universal Time change

07:00 - 08:00	Session 5: For ResearchersResearch / ERA / Tool Demonstration at ICPC Chair(s): Bin Lin Università della Svizzera italiana (USI)

07:00 15m Paper		A Literature Review of Automatic Traceability Links Recovery for Software Change Impact Analysis Research Thazin Win Win Aung University of Technology Sydney, Yulei Sui University of Technology Sydney, Australia, Huan Huo University of Technology Sydney Media Attached
07:15 15m Paper		Improving Code Search with Co-Attentive Representation Learning Research Jianhang Shuai School of Big Data & Software Engineering, Chongqing University, Ling Xu School of Big Data & Software Engineering, Chongqing University, Chao Liu Zhejiang University, Meng Yan School of Big Data & Software Engineering, Chongqing University, Xin Xia Monash University, Yan Lei School of Big Data & Software Engineering, Chongqing University Media Attached
07:30 15m Paper		OpenSZZ: A Free, Open-Source, Web-Accessible Implementation of the SZZ Algorithm Tool Demonstration Valentina Lenarduzzi LUT University , Fabio Palomba University of Salerno, Davide Taibi Tampere University , Damian Andrew Tamburri Jheronimus Academy of Data Science Media Attached
07:45 15m Paper		Staged Tree Matching for Detecting Code Move across Files ERA Akira Fujimoto Osaka University, Yoshiki Higo Osaka University, Junnosuke Matsumoto , Shinji Kusumoto Osaka University Media Attached