Evaluating few shot and Contrastive learning Methods for Code Clone Detection (MSR 2022 - Registered Reports)

Who

Mohamad Khajezade, Fatemeh Hendijani Fard, Mohamed S Shehata

Track

MSR 2022 Registered Reports

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 19 May 2022 21:32 - 21:36 at MSR Main room - odd hours - Session 13: Security & Quality Chair(s): Gias Uddin

Abstract

Context: Code Clone Detection (CCD) is a software engineering task that is used for plagiarism detection, code search, and code comprehension. Recently, deep learning-based models have achieved an F1 score (a metric used to assess classifiers) of $\sim$95% on the CodeXGLUE benchmark. These models require many training data, mainly fine-tuned on Java or C++ datasets. However, no previous study evaluates the generalizability of these models where a limited amount of annotated data is available.

Objective: The main objective of this research is to assess the ability of the CCD models as well as few shot learning algorithms for unseen programming problems and new languages (i.e., the model is not trained on these problems/languages).

\textit{Method:} We assess the generalizability of the state of the art models for CCD in few shot settings (i.e., only a few samples are available for fine-tuning) by setting three scenarios: i) unseen problems, ii) unseen languages, iii) combination of new languages and new problems. We choose three datasets of BigCloneBench, POJ-104, and CodeNet and Java, C++, and Ruby languages. Then, we employ Model Agnostic Meta-learning (MAML), where the model learns a meta-learner capable of extracting transferable knowledge from the train set; so that the model can be fine-tuned using a few samples. Finally, we combine contrastive learning with MAML to further study whether it can improve the results of MAML.

Link to Preprint

http://arxiv.org/abs/2204.07501

Mohamad Khajezade

University of British Columbia

Fatemeh Hendijani Fard

University of British Columbia

Canada

Mohamed S Shehata

University of British Columbia

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 19 May
Displayed time zone: Eastern Time (US & Canada) change

21:00 - 21:50	Session 13: Security & QualityTechnical Papers / Data and Tool Showcase Track / Registered Reports / Industry Track at MSR Main room - odd hours Chair(s): Gias Uddin University of Calgary, Canada

21:00 7m Talk		On the Use of Fine-grained Vulnerable Code Statements for Software Vulnerability Assessment Models Technical Papers Triet Le The University of Adelaide, Muhammad Ali Babar University of Adelaide Pre-print
21:07 7m Talk		LineVD: Statement-level Vulnerability Detection using Graph Neural Networks Technical Papers David Hin The University of Adelaide, Andrey Kan The University of Adelaide, Huaming Chen The University of Adelaide, Muhammad Ali Babar University of Adelaide
21:14 7m Talk		LineVul: A Transformer-based Line-Level Vulnerability Prediction Technical Papers Michael Fu Monash University, Kla Tantithamthavorn Monash University Pre-print
21:21 4m Talk		ECench: An Energy Bug Benchmark of Ethereum Client Software Data and Tool Showcase Track Jinyoung Kim Sungkyunkwan University, Misoo Kim Sungkyunkwan University, Eunseok Lee Sungkyunkwan University
21:25 7m Talk		Microsoft CloudMine: Data Mining for the Executive Order on Improving the Nation’s Cybersecurity Industry Track Kim Herzig Tools for Software Engineers, Microsoft, Luke Gostling Microsoft Corporation, Maximilian Grothusmann Microsoft Corporation, Nora Huang Microsoft Corporation, Sascha Just Microsoft, Alan Klimowski Microsoft Corporation, Yashasvini Ramkumar Microsoft Corporation, Myles McLeroy Microsoft Corporation, Kıvanç Muşlu Microsoft, Hitesh Sajnani Microsoft , Varsha Vadaga Microsoft Corporation
21:32 4m Talk		Evaluating few shot and Contrastive learning Methods for Code Clone Detection Registered Reports Mohamad Khajezade University of British Columbia, Fatemeh Hendijani Fard University of British Columbia, Mohamed S Shehata University of British Columbia Pre-print
21:36 14m Live Q&A		Discussions and Q&A Technical Papers

Information for Participants

Thu 19 May 2022 21:00 - 21:50 at MSR Main room - odd hours - Session 13: Security & Quality Chair(s): Gias Uddin

Info for room MSR Main room - odd hours:

Click here to go to the room on Midspace