K-Pop the Ultimate Compilation: No Kernel Left Behind (APLAS 2024 - Keynote)

Track

APLAS 2024 Keynote

Time Zone

The program is currently displayed in (GMT+09:00) Osaka, Sapporo, Tokyo.

Use conference time zone: (GMT+09:00) Osaka, Sapporo, TokyoSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 23 Oct 2024 09:00 - 10:00 at Yamauchi Hall - Keynote 2 Chair(s): Oleg Kiselyov

Abstract

Can one build a compiler that will eventually save a lot of performance engineering effort while immediately delivering competitive results? Can one achieve near hardware-peak performance while offering high-level programming abstractions and doing it now rather than putting it to tomorrow? The question is particularly hot in a domain-specific setting, where classical infrastructure for optimizing compiler construction may be inadequate, too generic, or too low-level.

I used to see the world through the lenses of domain-specific optimization being superior to lower-level kernel programming. And indeed, after decades of trying in the broader supercomputing landscape, the optimizing compiler community finally got reasons to celebrate. ML frameworks and compilers have essentially won the abstraction and automation battle. Purely functional embedded DSLs such as JAX offer performance portability from edge devices to data-centers, and massive code reuse across diverse scenarios from massive scale pre-training to low-latency inference through distillation and quantization.

Yet, the generative AI race and the global scarcity of computing resources (ML accelerators) just brought ML practitioners backwards a decade when it comes to achieving competitive performance. Device-specific kernel programming has become, once again, ubiquitous. The performance of leading generative models comes with a dramatic loss of programmability and portability.

Why and how did we get there? Are we ever going to find a way out of this programmability/performance dilemma? What about the velocity and agility of compiler engineers? Can ML accelerate compiler construction and help address the issue? Can we make ML-based heuristics scale enough to compile billions of lines of code? Didn’t we just design and implement MLIR so that domain-specific compiler engineering could scale, enabling massive code reuse across domains, languages and hardware? We will review these questions, based on recent successes and half-successes in academia and industry. We will also form an invitation to tackle these challenges in future research and software development.

File attachments

Slides (K-Pop_the_Ultimate_Compilation.pdf)	13.56MiB

Bio

Albert Cohen, Google DeepMind Paris, Research Scientist. Albert works on the acceleration and energy-efficiency of machine learning models. An alumnus of École Normale Supérieure de Lyon and the University of Versailles (Paris Saclay), he first joined INRIA, then also held a part-time associate professor position at École Polytechnique. He has been a visiting scholar at the University of Illinois, an invited professor at Philips Research as a recipient of a Marie Curie technology transfer fellowship, and a visiting professor at Facebook Artificial Intelligence Research. Albert’s work spans the theory and practice of programming languages, parallelism, high-performance and power-efficient computing, as well as safety-critical embedded control, resulting in 250 peer-reviewed publications together with 30 PhD students and international collaborators. Some of this work led to technology transfer, including contributions to the industry standard GCC and LLVM compilers. Since joining Google, Albert contributed to the design and adoption of the MLIR platform for scalable and efficient machine learning.

Time Zone

The program is currently displayed in (GMT+09:00) Osaka, Sapporo, Tokyo.

Use conference time zone: (GMT+09:00) Osaka, Sapporo, TokyoSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 23 Oct
Displayed time zone: Osaka, Sapporo, Tokyo change

09:00 - 10:00	Keynote 2Keynote at Yamauchi Hall Chair(s): Oleg Kiselyov Tohoku University

09:00 60m Talk		K-Pop the Ultimate Compilation: No Kernel Left Behind Keynote Albert Cohen Google DeepMind File Attached

K-Pop the Ultimate Compilation: No Kernel Left Behind

Program Display Configuration

Program Display Configuration

Wed 23 OctDisplayed time zone: Osaka, Sapporo, Tokyo change

Albert Cohen

Google DeepMind

Wed 23 Oct
Displayed time zone: Osaka, Sapporo, Tokyo change