CGO 2024
Sat 2 - Wed 6 March 2024 Edinburgh, United Kingdom
Mon 4 Mar 2024 12:30 - 12:50 at Tinto - Machine-Learning Guided Optimizations Chair(s): Zheng Wang

Computing gradients is a crucial task in many domains, including machine learning, physics simulations, and scientific computing. Automatic differentiation (AD) computes gradients for arbitrary imperative code. In reverse mode AD, an auxiliary structure, the tape, is used to transfer intermediary values required for gradient computation. The challenge is how to organize the tape in the memory hierarchy since it has a high reuse distance, lacks temporal locality, and inflates working set by 2 — 4×. We introduce Tapeflow, a compiler framework to orchestrate and manage the gradient tape. We make three key contributions. i) We introduce the concept of regions, which transforms the tape layout into an array-of-structs format to improve spatial reuse. ii) We schedule the execution into layers and explicitly orchestrate the tape operands using a scratchpad. This reduces the required cache size and on-chip energy. iii) Finally, we stream the tape from the DRAM by organizing it into a FIFO of tiles. The tape operands arrive just-in-time for each layer. Tapeflow, running on the same hardware, outperforms Enzyme, the state-of-the-art compiler, by 1.3—2.5×, reduces on-chip SRAM usage by 5— 40×, and saves 8× on-chip energy. We demonstrate Tapeflow on a wide range of algorithms written in general-purpose language.
Index Terms—Automatic differentiation, Gradients, Streaming Algorithms, Back propagation

Mon 4 Mar

Displayed time zone: London change

11:30 - 12:50
Machine-Learning Guided OptimizationsMain Conference at Tinto
Chair(s): Zheng Wang University of Leeds
11:30
20m
Talk
AskIt: Unified Programming Interface for Programming with Large Language Models
Main Conference
Katsumi Okuda Massachusetts Institute of Technology; Mitsubishi Electric Corporation, Saman Amarasinghe Massachusetts Institute of Technology
11:50
20m
Talk
Revealing Compiler Heuristics through Automated Discovery and Optimization
Main Conference
Volker Seeker Meta AI Research, Chris Cummins Meta AI Research, Murray Cole University of Edinburgh, Björn Franke University of Edinburgh, Kim Hazelwood Meta AI Research, Hugh Leather Meta AI Research
12:10
20m
Talk
SLaDe: A Portable Small Language Model Decompiler for Optimized Assembly
Main Conference
Jordi Armengol-Estapé University of Edinburgh, Jackson Woodruff University of Edinburgh, Chris Cummins Meta AI Research, Michael F. P. O'Boyle University of Edinburgh
Pre-print
12:30
20m
Talk
TapeFlow: Streaming Gradient Tapes in Automatic Differentiation
Main Conference
Milad Hakimi Simon Fraser University, Arrvindh Shriraman Simon Fraser University
Media Attached