SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly (CGO 2024 - Main Conference)

Who

Jordi Armengol-Estapé, Jackson Woodruff, Chris Cummins, Michael F. P. O'Boyle

Track

CGO 2024 Main Conference

Time Zone

The program is currently displayed in (GMT) London.

Use conference time zone: (GMT) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 4 Mar 2024 12:10 - 12:30 at Tinto - Machine-Learning Guided Optimizations Chair(s): Zheng Wang

Abstract

Decompilation is a well-studied area with numerous high-
quality tools available. These are frequently used for security
tasks and to port legacy code. However, they regularly generate
difficult-to-read programs and require a large amount of
engineering effort to support new programming languages
and ISAs. Recent interest in neural approaches has produced
portable tools that generate readable code. Nevertheless, to-date
such techniques are usually restricted to synthetic programs
without optimization, and no models have evaluated their
portability. Furthermore, while the code generated may be
more readable, it is usually incorrect.
This paper presents SLaDe, a Small Language model
Decompiler based on a sequence-to-sequence Transformer
trained over real-world code and augmented with a type
inference engine. We utilize a novel tokenizer, dropout-free
regularization, and type inference to generate programs that
are more readable and accurate than standard analytic and
recent neural approaches. Unlike standard approaches, SLaDe
can infer out-of-context types and unlike neural approaches, it
generates correct code.
We evaluate SLaDe on over 4,000 ExeBench functions on
two ISAs and at two optimization levels. SLaDe is up to 6×
more accurate than Ghidra, a state-of-the-art, industrial-strength
decompiler and up to 4× more accurate than the large language
model ChatGPT and generates significantly more readable code
than both.

Link to Preprint

https://arxiv.org/abs/2305.12520

Jordi Armengol-Estapé

University of Edinburgh

United Kingdom

Jackson Woodruff

University of Edinburgh

United Kingdom

Chris Cummins

Meta AI Research

United States

Michael F. P. O'Boyle

University of Edinburgh

United Kingdom

Time Zone

The program is currently displayed in (GMT) London.

Use conference time zone: (GMT) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 4 Mar
Displayed time zone: London change

11:30 - 12:50	Machine-Learning Guided OptimizationsMain Conference at Tinto Chair(s): Zheng Wang University of Leeds

11:30 20m Talk		AskIt: Unified Programming Interface for Programming with Large Language Models Main Conference Katsumi Okuda Massachusetts Institute of Technology; Mitsubishi Electric Corporation, Saman Amarasinghe Massachusetts Institute of Technology
11:50 20m Talk		Revealing Compiler Heuristics through Automated Discovery and Optimization Main Conference Volker Seeker Meta AI Research, Chris Cummins Meta AI Research, Murray Cole University of Edinburgh, Björn Franke University of Edinburgh, Kim Hazelwood Meta AI Research, Hugh Leather Meta AI Research
12:10 20m Talk		SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly Main Conference Jordi Armengol-Estapé University of Edinburgh, Jackson Woodruff University of Edinburgh, Chris Cummins Meta AI Research, Michael F. P. O'Boyle University of Edinburgh Pre-print
12:30 20m Talk		TapeFlow: Streaming Gradient Tapes in Automatic Differentiation Main Conference Milad Hakimi Simon Fraser University, Arrvindh Shriraman Simon Fraser University Media Attached