CoSTV: Accelerating Code Search with Two-Stage Paradigm and Vector Retrieval (APSEC 2024 - SEIP - Software Engineering in Practice)

Who

Dewu Zheng, Yanlin Wang, Wenqing Chen, Jiachi Chen, Zibin Zheng

Track

APSEC 2024 SEIP - Software Engineering in Practice

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 5 Dec 2024 15:00 - 15:20 at Room 3 (Xiangquan Ballroom) - Session (10) Chair(s): In-Young Ko

Abstract

Given a query in natural language, code search is designed to search the corresponding target code from a code base, which can accelerate the software development process. Recent pre-trained code models based on deep learning can capture the semantic connection between programming language and natural language, generating more accurate vector representations for codes and queries, significantly improving the matching accuracy between programming language and natural language. However, in recent years, most research on code search only focuses on improving the accuracy of code search while neglecting the importance of efficiency. In this paper, we propose a novel code search framework CoSTV to speed up the code search process. CoSTV employs a two-stage paradigm to combine the advantages of both bi-encoder and cross-encoder in terms of efficiency and accuracy, decoupling the code search procedure into recall and re-rank stages. Specifically, we introduce a vector retrieval system, program simplification, and knowledge distillation approaches to substantially accelerate code search while retaining parallel accuracy. In the recall stage, CoSTV utilizes a bi-encoder code search model and vector retrieval engine to rapidly recall highly relevant code candidates. In the re-rank stage, CoSTV employs a cross-encoder-based code search model, program simplification, and model distillation to enhance the precision of code search. Extensive experiments conducted on the CodeSearchNet dataset indicate that compared with previous code search baselines, CoSTV can reduce the time of code search by 79.1% while improving the accuracy of code search by 7.93% on average.

Dewu Zheng

Sun yat-sen University

China

Yanlin Wang

Sun Yat-sen University

China

Wenqing Chen

Sun Yat-sen University

China

Jiachi Chen

Sun Yat-sen University

China

Zibin Zheng

Sun Yat-sen University

China

Time Zone

The program is currently displayed in (GMT+08:00) Beijing, Chongqing, Hong Kong, Urumqi.

Use conference time zone: (GMT+08:00) Beijing, Chongqing, Hong Kong, UrumqiSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 5 Dec
Displayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change

14:00 - 15:30	Session (10)Technical Track / SEIP - Software Engineering in Practice at Room 3 (Xiangquan Ballroom) Chair(s): In-Young Ko Korea Advanced Institute of Science and Technology

14:00 30m Talk		Why not Just Look For Answers? Using A More Direct Way for API Recommendation Technical Track Changxin Liu Chongqing University, Ling Xu School of Big Data & Software Engineering, Chongqing University, Wenhan Mu Chongqing University, Rui Qin Chongqing University
14:30 30m Talk		Learning Heterogeneous Abstract Code Graph Representations For Program Comprehension Technical Track Shenning Song The College of Computer Science and Technology, Jilin University, Mengxi Zhang The College of Computer Science and Technology, Jilin University, Shaoquan Li The College of Computer Science and Technology, Jilin University, huaxiao liu The College of Computer Science and Technology, Jilin University
15:00 20m Talk		CoSTV: Accelerating Code Search with Two-Stage Paradigm and Vector Retrieval SEIP - Software Engineering in Practice Dewu Zheng Sun yat-sen University, Yanlin Wang Sun Yat-sen University, Wenqing Chen Sun Yat-sen University, Jiachi Chen Sun Yat-sen University, Zibin Zheng Sun Yat-sen University