CrystalBLEU: Precisely and Efficiently Measuring the Similarity of Code (ICSE 2022 - Posters)

Who

Aryaz Eghbali, Michael Pradel

Track

ICSE 2022 Posters

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 11 May 2022 03:15 - 03:20 at ICSE Poster room - Poster Session 2 Chair(s): Elena Sherman

Abstract

Recent work has focused on using machine learning to automate software engineering processes, such as code completion, code migration, and generating code from natural language description. One of the challenges faced in these tasks is evaluating the quality of the predictions, which is usually done by comparing the prediction to a reference solution. BLEU score has been adopted for programming languages as it can be easily computed for any programming language and even incomplete source code, while enabling fast automated evaluation. However, programming languages are more verbose and have strict syntax when compared to natural languages. This feature causes BLEU to find common n-grams in unrelated programs, which makes distinguishing similar pairs of programs from dissimilar pairs hard. This work presents CrystalBLEU, an evaluation metric based on BLEU, that mitigates the distinguishability problem. Our metric maintains the desirable properties of BLEU, such as handling partial code, applicability to all programming languages, high correlation with human judgement, and efficiency, in addition to reducing the effects of the trivially matched n-grams. We evaluate CrystalBLEU on two datasets from previous work and a new dataset of human-written code. Our results show that CrystalBLEU differentiates similar and unrelated programs better than the original BLEU score and also a variant designed specifically for source code, CodeBLEU.

Aryaz Eghbali

University of Stuttgart

Michael Pradel

University of Stuttgart

Germany

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 11 May
Displayed time zone: Eastern Time (US & Canada) change

03:00 - 04:00	Poster Session 2Posters at ICSE Poster room Chair(s): Elena Sherman Boise State University

5m Poster		A Static Analyzer for Detecting Tensor Shape Errors in Deep Neural Network Training Code Posters Ho Young Jhoo Seoul National University, Sehoon Kim Seoul National University, Woosung Song Seoul National University, Kyuyeon Park Seoul National University, DongKwon Lee Seoul National University, South Korea, Kwangkeun Yi Seoul National University, South Korea Pre-print
5m Poster		Garuda: Heap aware symbolic execution Posters Ajinkya Rajput , Dr. K. Gopinath Indian Institute of Science, Banglore
5m Poster		The Symptoms, Causes, and Repairs of Workarounds in Apache Issue Trackers Posters Aoyang Yan Shanghai Jiao Tong University, Hao Zhong Shanghai Jiao Tong University, Daohan Song Shanghai Jiao Tong University, Li Jia Shanghai Jiao Tong University
5m Poster		CrystalBLEU: Precisely and Efficiently Measuring the Similarity of Code Posters Aryaz Eghbali University of Stuttgart, Michael Pradel University of Stuttgart
5m Poster		CRISCE: Towards Generating Test Cases from Accident Sketches Posters Vuong Nguyen University of Passau, Alessio Gambi University of Passau, Jasim Ahmed University of Passau, Gordon Fraser University of Passau
5m Poster		Deep Learning-based Production and Test Bug Report Classification using Source Files Posters Misoo Kim Sungkyunkwan University, Youngkyoung Kim Sungkyunkwan University, Eunseok Lee Sungkyunkwan University

Information for Participants

Wed 11 May 2022 03:00 - 04:00 at ICSE Poster room - Poster Session 2 Chair(s): Elena Sherman

Info for room ICSE Poster room:

Click here to go to the room on Midspace