DelayRepay: Delayed Execution for Kernel Fusion in Python
Thu 19 Nov 2020 23:20 - 23:40 at SPLASH-III - 7 Chair(s): Mihaela Sighireanu
Python is a popular, dynamic language for data science and scientific computing.
To ensure efficiency, significant numerical libraries are implemented in static native languages.
However, performance suffers when switching between native and non-native code, especially if data has to be converted between native arrays and Python data structures.
As GPU accelerators are increasingly used, this problem becomes particularly acute.
Data and control has to be repeatedly transferred between the accelerator and the host.
In this paper, we present DelayRepay, a delayed execution framework for numeric Python programs.
It avoids excessive switching and data transfer by using lazy evaluation and kernel fusion.
Using DelayRepay, operations on NumPy arrays are executed lazily, allowing multiple calls to accelerator kernels to be fused together dynamically.
DelayRepay is available as a drop-in replacement for existing Python libraries.
This approach enables significant performance improvement over the state-of-the-art and is invisible to the application programmer.
We show that our approach provides a maximum $377\times$ speedup over NumPy - a
409% increase over the state of the art.
Thu 19 NovDisplayed time zone: Central Time (US & Canada) change
11:00 - 12:20 | 7SAS / DLS 2020 at SPLASH-III +12h Chair(s): Tim Felgentreff Oracle Labs, Potsdam, Kedar Namjoshi Nokia Bell Labs | ||
11:00 20mResearch paper | Interprocedural Shape Analysis Using Separation Logic-based Transformer Summaries SAS Hugo Illous CEA & INRIA / ENS Paris, Matthieu Lemerre CEA LIST, France, Xavier Rival INRIA/CNRS/ENS Paris File Attached | ||
11:20 20mTalk | DelayRepay: Delayed Execution for Kernel Fusion in Python DLS 2020 John Magnus Morton University of Edinburgh, Kuba Kaszyk University of Edinburgh, Lu Li Linköping University, Jiawen Sun University of Edinburgh, Christophe Dubach McGill University, Michel Steuwer The University of Edinburgh, Murray Cole University of Edinburgh, UK, Michael F. P. O'Boyle University of Edinburgh Link to publication DOI Pre-print Media Attached | ||
11:40 20mResearch paper | Stratified Guarded First-order Transition Systems SAS Christian Müller Technische Universität München, Saarland University, Helmut Seidl Technische Universität München File Attached | ||
12:00 20mTalk | Sampling Optimized Code for Type Feedback DLS 2020 Olivier Flückiger Northeastern University, Andreas Wälchli University of Bern, Sebastián Krynski Czech Technical University, National University of Quilmes, Jan Vitek Northeastern University / Czech Technical University Link to publication DOI Pre-print Media Attached |
23:00 - 00:20 | |||
23:00 20mResearch paper | Interprocedural Shape Analysis Using Separation Logic-based Transformer Summaries SAS Hugo Illous CEA & INRIA / ENS Paris, Matthieu Lemerre CEA LIST, France, Xavier Rival INRIA/CNRS/ENS Paris File Attached | ||
23:20 20mTalk | DelayRepay: Delayed Execution for Kernel Fusion in Python DLS 2020 John Magnus Morton University of Edinburgh, Kuba Kaszyk University of Edinburgh, Lu Li Linköping University, Jiawen Sun University of Edinburgh, Christophe Dubach McGill University, Michel Steuwer The University of Edinburgh, Murray Cole University of Edinburgh, UK, Michael F. P. O'Boyle University of Edinburgh Link to publication DOI Pre-print Media Attached | ||
23:40 20mResearch paper | Stratified Guarded First-order Transition Systems SAS Christian Müller Technische Universität München, Saarland University, Helmut Seidl Technische Universität München File Attached | ||
00:00 20mTalk | Sampling Optimized Code for Type Feedback DLS 2020 Olivier Flückiger Northeastern University, Andreas Wälchli University of Bern, Sebastián Krynski Czech Technical University, National University of Quilmes, Jan Vitek Northeastern University / Czech Technical University Link to publication DOI Pre-print Media Attached |