LAGrad: Statically Optimized Differentiable Programming in MLIR
Automatic differentiation (AD) is a central algorithm in deep learning and the emerging field of differentiable programming. However, the performance of AD remains a significant bottleneck in these fields. Training large models requires repeatedly evaluating gradients via AD potentially millions of times. Additionally, the most common form of AD incurs an asymptotically large memory cost relative to the original function being differentiated.
This paper introduces LAGrad, a reverse-mode, source-to-source AD system that leverages high-level information in MLIR to produce efficient differentiated code. LAGrad employs a collection of novel static optimizations that benefit from the semantics of high-level MLIR dialects to exploit the sparsity and structured control flow of generated code.
Using these, LAGrad is able to achieve speedups of up to $2.8\times$ and use $35\times$ less memory relative to state of the art AD systems on real-world machine learning and computer vision benchmarks.
Sun 26 FebDisplayed time zone: Eastern Time (US & Canada) change
11:20 - 12:20 | OptimizationsResearch Papers at St. Laurent 3 Chair(s): Louis-Noël Pouchet Colorado State University, USA | ||
11:20 20mTalk | A Hotspot-Driven Semi-automated Competitive Analysis Framework for Identifying Compiler Key Optimizations Research Papers Wenlong Mu East China Normal University, Yilei Zhang East China Normal University, Bo Huang East China Normal University, Jianmei Guo East China Normal University, Shiqiang Cui Hangzhou Hongjun Microelectronics Technology DOI | ||
11:40 20mTalk | LAGrad: Statically Optimized Differentiable Programming in MLIR Research Papers DOI | ||
12:00 20mTalk | Lazy Evaluation for the Lazy: Automatically Transforming Call-by-Value into Call-by-Need Research Papers Breno Campos Ferreira Guimarães Federal University of Minas Gerais, Fernando Magno Quintão Pereira Federal University of Minas Gerais DOI |