Tackling the Matrix Multiplication Micro-kernel Generation with Exo (CGO 2024 - Main Conference)

Who

Adrián Castelló, Julian Bellavita, Grace Dinh, Yuka Ikarashi, Héctor Martínez

Track

CGO 2024 Main Conference

Time Zone

The program is currently displayed in (GMT) London.

Use conference time zone: (GMT) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 4 Mar 2024 17:10 - 17:30 at Tinto - Custom Processors Chair(s): Rodrigo C. O. Rocha

Abstract

The optimization of the matrix multiplication (or gemm) has been a need during the last decades. This operation is considered the flagship of current linear algebra libraries such as BLIS, OpenBLAS, or Intel OneAPI because of its widespread use in a large variety of scientific applications. The gemm is usually implemented following the GotoBLAS philosophy, which tiles the gemm operands and uses a series of nested loops for performance improvement. These approaches extract the maximum computational power of the architectures through small pieces of hardware-oriented, high-performance code called micro-kernel. However, this approach forces developers to generate, with a non-negligible effort, a dedicated micro-kernel for each new hardware.

In this work, we present a step-by-step procedure for generating micro-kernels with the exo compiler that perform close to (or even better than) manually developed microkernels written with intrinsic functions or assembly language. Our solution also improves the portability of the generated code, since a hardware target is fully specified by a concise library-based description of its instructions.

Link to Preprint

https://arxiv.org/pdf/2310.17408.pdf

Adrián Castelló

Universitat Politècnica de València

Spain

Julian Bellavita

Cornell University

United States

Grace Dinh

University of California at Berkeley

United States

Yuka Ikarashi

Massachusetts Institute of Technology

United States

Héctor Martínez

Universidad de Córdoba

Spain

Time Zone

The program is currently displayed in (GMT) London.

Use conference time zone: (GMT) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 4 Mar
Displayed time zone: London change

16:10 - 17:30	Custom ProcessorsMain Conference at Tinto Chair(s): Rodrigo C. O. Rocha Huawei

16:10 20m Talk		AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators Main Conference Nicolas Bohm Agostini Northeastern University; Pacific Northwest National Laboratory, Jude Haris University of Glasgow, Perry Gibson University of Glasgow, Malith Jayaweera Northeastern University, norm rubin Northeastern University, Antonino Tumeo Pacific Northwest National Laboratory, José L. Abellán University of Murcia, José Cano University of Glasgow, David Kaeli Northeastern University Pre-print
16:30 20m Talk		Ecmas: Efficient Circuit Mapping and Scheduling for Surface Code Main Conference Mingzheng Zhu University of Science and Technology of China, Hao Fu University of Science and Technology of China, Jun Wu University of Science and Technology of China, Chi Zhang University of Science and Technology of China, Wei Xie University of Science and Technology of China, Xiang-Yang Li University of Science and Technology of China Pre-print
16:50 20m Talk		PresCount: Effective Register Allocation for Bank Conflict Reduction Main Conference Xiaofeng Guan Shanghai Jiao Tong University; Shanghai Enflame Technology, Hao Zhou Shanghai Enflame Technology, Guoqing Bao Shanghai Enflame Technology, Handong Li Shanghai Jiao Tong University, Liang Zhu Shanghai Jiao Tong University, Jianguo Yao Shanghai Jiao Tong University; Shanghai Enflame Technology Pre-print
17:10 20m Talk		Tackling the Matrix Multiplication Micro-kernel Generation with Exo Main Conference Adrián Castelló Universitat Politècnica de València, Julian Bellavita Cornell University, Grace Dinh University of California at Berkeley, Yuka Ikarashi Massachusetts Institute of Technology, Héctor Martínez Universidad de Córdoba Pre-print