How to Select Pre-Trained Code Models for Reuse? A Learning Perspective
Pre-training a language model and then fine-tuning it has shown to be an efficient and effective technique for a wide range of code intelligence tasks, such as code generation, code summarization, and vulnerability detection. However, pre-training language models on a large-scale code corpus is computationally expensive. Fortunately, many off-the-shelf pre-trained code models (PCMs), such as CodeBERT, CodeT5, CodeGen, and Code Llama, have been released publicly. These models acquire general code understanding and generation capability during pre-training, which enhances their performance on downstream code intelligence tasks. With an increasing number of these public pre-trained models, selecting the most suitable one to reuse for a specific task is essential. In this paper, we systematically investigate the reusability of PCMs. We first explore three intuitive model selection methods that select by size, training data, or brute-force fine-tuning. Experimental results show that these straightforward techniques either perform poorly or suffer high costs. Motivated by these findings, we explore learning-based model selection strategies that utilize pre-trained models without altering their parameters. Specifically, we train proxy models to gauge the performance of pre-trained models, and measure the distribution deviation between a model’s latent features and the task’s labels, using their closeness as an indicator of model transferability. We conduct experiments on 100 widely-used open-source PCMs for code intelligence tasks, with sizes ranging from 42.5 million to 3 billion parameters. The results demonstrate that learning-based selection methods reduce selection time to 100 seconds, compared to 2,700 hours with brute-force fine-tuning, with less than 6% performance degradation across related tasks.
Fri 7 MarDisplayed time zone: Eastern Time (US & Canada) change
11:00 - 12:30 | Mining Software RepositoriesResearch Papers / Early Research Achievement (ERA) Track / Journal First Track / Reproducibility Studies and Negative Results (RENE) Track at L-1720 Chair(s): Brittany Reid Nara Institute of Science and Technology | ||
11:00 15mTalk | An Empirical Study of Transformer Models on Automatically Templating GitHub Issue Reports Research Papers Jin Zhang Hunan Normal University, Maoqi Peng Hunan Normal University, Yang Zhang National University of Defense Technology, China | ||
11:15 15mTalk | How to Select Pre-Trained Code Models for Reuse? A Learning Perspective Research Papers Zhangqian Bi Huazhong University of Science and Technology, Yao Wan Huazhong University of Science and Technology, Zhaoyang Chu Huazhong University of Science and Technology, Yufei Hu Huazhong University of Science and Technology, Junyi Zhang Huazhong University of Science and Technology, Hongyu Zhang Chongqing University, Guandong Xu University of Technology, Hai Jin Huazhong University of Science and Technology Pre-print | ||
11:30 7mTalk | Uncovering the Challenges: A Study of Corner Cases in Bug-Inducing Commits Early Research Achievement (ERA) Track | ||
11:37 15mTalk | A Bot Identification Model and Tool Based on GitHub Activity Sequences Journal First Track Natarajan Chidambaram University of Mons, Alexandre Decan University of Mons; F.R.S.-FNRS, Tom Mens University of Mons | ||
11:52 15mTalk | Does the Tool Matter? Exploring Some Causes of Threats to Validity in Mining Software Repositories Reproducibility Studies and Negative Results (RENE) Track Nicole Hoess Technical University of Applied Sciences Regensburg, Carlos Paradis No Affiliation, Rick Kazman University of Hawai‘i at Mānoa, Wolfgang Mauerer Technical University of Applied Sciences Regensburg |