Write a Blog >>
Tue 11 Oct 2022 14:50 - 15:10 at Banquet A - Technical Session 6 - Source Code Manipulation Chair(s): Collin McMillan

Recent years have brought a surge of work on predicting pieces of source code, e.g., for code completion, code migration, program repair, or translating natural language into code. All this work faces the challenge of evaluating the quality of a prediction w.r.t. some oracle, typically in the form of a reference solution. A common evaluation metric is the BLEU score, an n-gram-based metric originally proposed for evaluating natural language translation, but adopted in software engineering because it can be easily computed on any programming language and enables automated evaluation at scale. However, a key difference between natural and programming languages is that in the latter, completely unrelated pieces of code may have many common n-grams simply because of the syntactic verbosity and coding conventions of programming languages. We observe that these trivially shared n-grams hamper the ability of the metric to distinguish between truly similar code examples and code examples that are merely written in the same language. This paper presents CrystalBLEU, an evaluation metric based on BLEU, that allows for precisely and efficiently measuring the similarity of code. Our metric preserves the desirable properties of BLEU, such as being language-agnostic, able to handle incomplete or partially incorrect code, and efficient, while reducing the noise caused by trivially shared n-grams. We evaluate CrystalBLEU on two datasets from prior work and on a new, labeled dataset of semantically equivalent programs. Our results show that CrystalBLEU can distinguish similar from dissimilar code examples 1.9–4.5 times more effectively, when compared to the original BLEU score and a previously proposed variant of BLEU for code.

Tue 11 Oct

Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:30
Technical Session 6 - Source Code ManipulationNIER Track / Research Papers / Late Breaking Results at Banquet A
Chair(s): Collin McMillan University of Notre Dame
14:00
10m
Vision and Emerging Results
Automatic Code Documentation Generation Using GPT-3
NIER Track
Junaed Younus Khan University of Calgary, Gias Uddin University of Calgary, Canada
14:10
20m
Research paper
Automated Feedback Generation for Competition-Level Code
Research Papers
Jialu Zhang Yale University, De Li The MathWorks, Inc., John C. Kolesar Yale University, Hanyuan Shi N/A, Ruzica Piskac Yale University
14:30
10m
Paper
Generalizability of Code Clone Detection on CodeBERT
Late Breaking Results
Tim Sonnekalb German Aerospace Center (DLR), Bernd Gruner German Aerospace Center (DLR), Clemens-Alexander Brust German Aerospace Center (DLR), Patrick Mäder Technische Universität Ilmenau
DOI Pre-print
14:40
10m
Vision and Emerging Results
Next Syntactic-Unit Code Completion and Applications
NIER Track
Hoan Anh Nguyen Amazon, Aashish Yadavally University of Texas at Dallas, Tien N. Nguyen University of Texas at Dallas
14:50
20m
Research paper
CrystalBLEU: Precisely and Efficiently Measuring the Similarity of CodeVirtualACM SIGSOFT Distinguished Paper Award
Research Papers
Aryaz Eghbali University of Stuttgart, Germany, Michael Pradel University of Stuttgart
15:10
20m
Research paper
Low-Resources Project-Specific Code SummarizationVirtual
Research Papers
Rui Xie Peking University, Tianxiang Hu Peking University, Wei Ye Peking University, Shikun Zhang Peking University