CGO 2024
Sat 2 - Wed 6 March 2024 Edinburgh, United Kingdom
Mon 4 Mar 2024 16:50 - 17:10 at Tinto - Custom Processors Chair(s): Rodrigo C. O. Rocha

Modern processors with large multi-banked register files often rely on hardware solutions to resolve bank conflicts efficiently. However, these hardware-based methods, while flexible, can incur runtime penalties and restrict the exploration of optimized hardware designs. In contrast, compiler-based methods for register bank assignments avoid runtime overhead. However, incorporating bank assignment into the complex register allocation process presents significant challenges, leading existing methods to adopt conservative approaches to avoid potential side effects.

This paper introduces the novel register allocation method \textit{PresCount}, which enhances the coloring strategy for the Register Conflict Graph (RCG) and incorporates a bank pressure tracking mechanism to improve performance. The integrated register bank assigner in \textit{PresCount} effectively reduces bank conflicts, achieving remarkable reductions of 43.28% and 27.76%, respectively, compared to existing methods on platforms with rich register banks and limited register budgets, as demonstrated by SPECfp and CNN-KERNEL benchmarks.

Furthermore, a subgroup splitting technique is introduced to facilitate register allocation under the bank-subgroup register file design, specifically our Domain-Specific Architecture (DSA) for AI computing. This technique demonstrates an impressive 99.85% reduction in bank conflicts for domain-specific kernel functions.

By addressing the challenges of bank conflicts in register allocation, the proposed \textit{PresCount} method showcases significant improvements in performance and efficiency for platforms with different register configurations and domain-specific workloads, allowing for more flexible exploration of optimized hardware designs.

Mon 4 Mar

Displayed time zone: London change

16:10 - 17:30
Custom ProcessorsMain Conference at Tinto
Chair(s): Rodrigo C. O. Rocha Huawei
16:10
20m
Talk
AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators
Main Conference
Nicolas Bohm Agostini Northeastern University; Pacific Northwest National Laboratory, Jude Haris University of Glasgow, Perry Gibson University of Glasgow, Malith Jayaweera Northeastern University, norm rubin Northeastern University, Antonino Tumeo Pacific Northwest National Laboratory, José L. Abellán University of Murcia, José Cano University of Glasgow, David Kaeli Northeastern University
Pre-print
16:30
20m
Talk
Ecmas: Efficient Circuit Mapping and Scheduling for Surface Code
Main Conference
Mingzheng Zhu University of Science and Technology of China, Hao Fu University of Science and Technology of China, Jun Wu University of Science and Technology of China, Chi Zhang University of Science and Technology of China, Wei Xie University of Science and Technology of China, Xiang-Yang Li University of Science and Technology of China
Pre-print
16:50
20m
Talk
PresCount: Effective Register Allocation for Bank Conflict Reduction
Main Conference
Xiaofeng Guan Shanghai Jiao Tong University; Shanghai Enflame Technology, Hao Zhou Shanghai Enflame Technology, Guoqing Bao Shanghai Enflame Technology, Handong Li Shanghai Jiao Tong University, Liang Zhu Shanghai Jiao Tong University, Jianguo Yao Shanghai Jiao Tong University; Shanghai Enflame Technology
Pre-print
17:10
20m
Talk
Tackling the Matrix Multiplication Micro-kernel Generation with Exo
Main Conference
Adrián Castelló Universitat Politècnica de València, Julian Bellavita Cornell University, Grace Dinh University of California at Berkeley, Yuka Ikarashi Massachusetts Institute of Technology, Héctor Martínez Universidad de Córdoba
Pre-print