Classifying Source Code: How Far Can Compressor-based Classifiers Go?
Thu 18 Apr 2024 11:12 - 11:24 at Vianna da Motta - SRC Presentations Chair(s): Mattia Fazzini, André Restivo
Pre-trained language models of code, which are built upon largescale datasets, millions of trainable parameters, and high computational resources cost, have achieved phenomenal success. Recently, researchers have proposed a compressor-based classifier (Cbc); it trains no parameters but is found to outperform BERT. We conduct the first empirical study to explore whether this lightweight alternative can accurately classify source code. Our study is more than applying Cbc to code-related tasks. We first identify an issue that the original implementation overestimates Cbc. After correction, Cbc’s performance on defect prediction drops from 80.7% to 63.0%, which is still comparable to CodeBERT (63.7%). We find that hyperparameter settings affect the performance. Besides, results show that Cbc can outperform CodeBERT when the training data is small, making it a good alternative in low-resource settings.
Wed 17 AprDisplayed time zone: Lisbon change
16:00 - 17:30 | SRC PostersSRC - ACM Student Research Competition at Open Space Chair(s): Mattia Fazzini University of Minnesota, André Restivo LIACC, Universidade do Porto, Porto, Portugal | ||
16:00 90mPoster | Program Decomposition and Translation with Static Analysis SRC - ACM Student Research Competition Ali Reza Ibrahimzada University of Illinois Urbana-Champaign DOI Pre-print File Attached | ||
16:00 90mPoster | IntTracer: Sanitization-aware IO2BO Vulnerability Detection across Codebases SRC - ACM Student Research Competition Xiang Chen Shanghai Jiao Tong University | ||
16:00 90mPoster | Vulnerability Root Cause Function Locating For Java Vulnerabilities SRC - ACM Student Research Competition Lyuye Zhang Nanyang Technological University | ||
16:00 90mPoster | Flakiness Repair in the Era of Large Language Models SRC - ACM Student Research Competition Yang Chen University of Illinois at Urbana-Champaign | ||
16:00 90mPoster | Refining Abstract Specifications into Dangerous Traffic Scenarios SRC - ACM Student Research Competition Aren Babikian McGill University | ||
16:00 90mPoster | An Ensemble Method for Bug Triaging using Large Language Models SRC - ACM Student Research Competition Atish Kumar Dipongkor University of Central Florida | ||
16:00 90mPoster | Classifying Source Code: How Far Can Compressor-based Classifiers Go? SRC - ACM Student Research Competition Zhou Yang Singapore Management University |
Thu 18 AprDisplayed time zone: Lisbon change
11:00 - 12:30 | SRC PresentationsSRC - ACM Student Research Competition at Vianna da Motta Chair(s): Mattia Fazzini University of Minnesota, André Restivo LIACC, Universidade do Porto, Porto, Portugal | ||
11:00 12mPoster | An Ensemble Method for Bug Triaging using Large Language Models SRC - ACM Student Research Competition Atish Kumar Dipongkor University of Central Florida | ||
11:12 12mPoster | Classifying Source Code: How Far Can Compressor-based Classifiers Go? SRC - ACM Student Research Competition Zhou Yang Singapore Management University | ||
11:24 12mPoster | Flakiness Repair in the Era of Large Language Models SRC - ACM Student Research Competition Yang Chen University of Illinois at Urbana-Champaign | ||
11:36 12mPoster | Program Decomposition and Translation with Static Analysis SRC - ACM Student Research Competition Ali Reza Ibrahimzada University of Illinois Urbana-Champaign DOI Pre-print File Attached | ||
11:48 12mPoster | Refining Abstract Specifications into Dangerous Traffic Scenarios SRC - ACM Student Research Competition Aren Babikian McGill University |