Just-in-time (JIT) code generation techniques are increasingly gaining traction due to their ability to significantly speedup computations. Small and medium matrix multiplication has wide uses in the domains of Machine Learning (ML) and scientific simulations in high performance computing. These computations can benefit greatly from JIT compilation. Prior runtime code generation approaches have primarily focused on loop unrolling and address compression techniques that focus on optimizing the instruction flow. In contrast, we consider a unique code generation approach where data values are dynamically embedded in instructions as immediate values, effectively transforming a memory load into an immediate load. By utilizing Intel’s AVX-512 vector extensions, we show that our technique achieves geometric mean speedups of 1.12$\times$ and 1.05$\times$ over MKL and MKL JIT, the current state of the art JIT library, for 32 input channels. We intend to open source our JIT library and plan to apply our approach to produce high performance convolution libraries.
Program Display Configuration
Tue 2 Mar
Displayed time zone: Eastern Time (US & Canada)change