UNGOML: Automated Classification of unsafe Usages in Go (MSR 2023 - Technical Papers)

Who

Anna-Katharina Wickert, Clemens Damke, Lars Baumgärtner, Eyke Hüllermeier, Mira Mezini

Track

MSR 2023 Technical Papers

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 15 May 2023 16:35 - 16:47 at Meeting Room 110 - Security Chair(s): Chanchal K. Roy

Abstract

The Go programming language offers strong protection from memory corruption. As an escape hatch of these protections, it provides the unsafe package. Previous studies identified that this unsafe package is frequently used in real-world code for several purposes, e.g., serialization or casting types. Due to the variety of these reasons, it may be possible to refactor specific usages to avoid potential vulnerabilities. However, the classification of unsafe usages is challenging and requires the context of the call and the program’s structure. In this paper, we present the first automated classifier for unsafe usages in Go, UNGOML, to identify what is done with the unsafe package and why it is used. For UNGOML, we built four custom deep-learning classifiers trained on a manually labeled data set. We represent Go code as enriched control-flow graphs (CFGs) and solve the label prediction task with one single-vertex and three context-aware classifiers. All three context-aware classifiers achieve a top-1 accuracy of more than 86% for both dimensions, WHAT and WHY. Furthermore, in a set-valued conformal prediction setting, we achieve accuracies of more than 93% with mean label set sizes of 2 for both dimensions. Thus, UNGOML can be used to efficiently filter unsafe usages for use cases such as refactoring or a security audit.

Link to Preprint

https://arxiv.org/pdf/2306.00694.pdf

File attachments

Slides of presentation (2023_ungoml.pdf)	4.11MiB

Anna-Katharina Wickert

TU Darmstadt, Germany

Germany

Clemens Damke

University of Munich (LMU)

Lars Baumgärtner

Technische Universität Darmstadt

Eyke Hüllermeier

University of Munich (LMU)

Mira Mezini

TU Darmstadt