OPass: Orchestrating TVM's Passes for Lowering Memory Footprints of Computation GraphsResearch Track Paper
Deep learning (DL) compilers, such as TVM and TensorFlow, encompass a variety of passes for optimizing computation graphs (i.e., DL models). Despite the efforts on developing optimization passes, it remains a challenge in arranging these passes — most compilers employ fixed pass sequences that do not fit with computation graphs of diverse structures; on the other hand, optimization passes have cascade effects, making the structures of graphs under compilation volatile and as well making it difficult to generate optimal sequences for graphs.
Inspired by recent progresses on static computing memory footprints (i.e., memory usages) of computation graphs, we introduce in this paper OPass, a novel approach to orchestrating TVM’s optimization passes for lowering memory footprints of computation graphs, and finally allowing the graphs to run on memory-constrained devices. The key idea is, given a computation graph G, to optimize the graph heuristically and iteratively: OPass learns the effects of passes on the graph; it then optimizes G iteratively — each iteration picks up a pass by the reduction of the memory footprint of G and as well the implicit effects of the pass for further optimizations, letting the pass be applied.
We evaluate OPass on ReBench (a suite of computation graphs) and two real-world models (Transformer and ResNet). The results clearly show the strength of OPass: it outperforms TVM’s default sequence by 1.77x in reducing graphs’ memory footprints, with affordable costs; it also offers extra memory reductions of 5~12% by catching the implicit effects of passes. Furthermore, OPass helps analyze positive/negative effects of passes to graphs’ memory footprints, providing TVM developers with best practices for designing optimization pass sequences.