APSEC 2024
Tue 3 - Fri 6 December 2024 China
Thu 5 Dec 2024 15:00 - 15:20 at Room 3 (Xiangquan Ballroom) - Session (10) Chair(s): In-Young Ko

Given a query in natural language, code search is designed to search the corresponding target code from a code base, which can accelerate the software development process. Recent pre-trained code models based on deep learning can capture the semantic connection between programming language and natural language, generating more accurate vector representations for codes and queries, significantly improving the matching accuracy between programming language and natural language. However, in recent years, most research on code search only focuses on improving the accuracy of code search while neglecting the importance of efficiency. In this paper, we propose a novel code search framework CoSTV to speed up the code search process. CoSTV employs a two-stage paradigm to combine the advantages of both bi-encoder and cross-encoder in terms of efficiency and accuracy, decoupling the code search procedure into recall and re-rank stages. Specifically, we introduce a vector retrieval system, program simplification, and knowledge distillation approaches to substantially accelerate code search while retaining parallel accuracy. In the recall stage, CoSTV utilizes a bi-encoder code search model and vector retrieval engine to rapidly recall highly relevant code candidates. In the re-rank stage, CoSTV employs a cross-encoder-based code search model, program simplification, and model distillation to enhance the precision of code search. Extensive experiments conducted on the CodeSearchNet dataset indicate that compared with previous code search baselines, CoSTV can reduce the time of code search by 79.1% while improving the accuracy of code search by 7.93% on average.

Thu 5 Dec

Displayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change

14:00 - 15:30
Session (10)Technical Track / SEIP - Software Engineering in Practice at Room 3 (Xiangquan Ballroom)
Chair(s): In-Young Ko Korea Advanced Institute of Science and Technology
14:00
30m
Talk
Why not Just Look For Answers? Using A More Direct Way for API Recommendation
Technical Track
Changxin Liu Chongqing University, Ling Xu School of Big Data & Software Engineering, Chongqing University, Wenhan Mu Chongqing University, Rui Qin Chongqing University
14:30
30m
Talk
Learning Heterogeneous Abstract Code Graph Representations For Program Comprehension
Technical Track
Shenning Song The College of Computer Science and Technology, Jilin University, Mengxi Zhang The College of Computer Science and Technology, Jilin University, Shaoquan Li The College of Computer Science and Technology, Jilin University, huaxiao liu The College of Computer Science and Technology, Jilin University
15:00
20m
Talk
CoSTV: Accelerating Code Search with Two-Stage Paradigm and Vector Retrieval
SEIP - Software Engineering in Practice
Dewu Zheng Sun yat-sen University, Yanlin Wang Sun Yat-sen University, Wenqing Chen Sun Yat-sen University, Jiachi Chen Sun Yat-sen University, Zibin Zheng Sun Yat-sen University