DelayRepay: Delayed Execution for Kernel Fusion in Python
Thu 19 Nov 2020 11:20 - 11:40 at SPLASH-III - 7 Chair(s): Kedar Namjoshi, Tim Felgentreff
Python is a popular, dynamic language for data science and scientific computing.
To ensure efficiency, significant numerical libraries are implemented in static native languages.
However, performance suffers when switching between native and non-native code, especially if data has to be converted between native arrays and Python data structures.
As GPU accelerators are increasingly used, this problem becomes particularly acute.
Data and control has to be repeatedly transferred between the accelerator and the host.
In this paper, we present DelayRepay, a delayed execution framework for numeric Python programs.
It avoids excessive switching and data transfer by using lazy evaluation and kernel fusion.
Using DelayRepay, operations on NumPy arrays are executed lazily, allowing multiple calls to accelerator kernels to be fused together dynamically.
DelayRepay is available as a drop-in replacement for existing Python libraries.
This approach enables significant performance improvement over the state-of-the-art and is invisible to the application programmer.
We show that our approach provides a maximum $377\times$ speedup over NumPy - a
409% increase over the state of the art.
Thu 19 Nov Times are displayed in time zone: Central Time (US & Canada) change
11:00 - 12:20: 7SAS / DLS 2020 at SPLASH-III +12h Chair(s): Kedar NamjoshiNokia Bell Labs, Tim FelgentreffOracle Labs, Potsdam | |||
11:00 - 11:20 Research paper | Interprocedural Shape Analysis Using Separation Logic-based Transformer Summaries SAS Hugo IllousCEA & INRIA / ENS Paris, Matthieu LemerreCEA LIST, France, Xavier RivalINRIA/CNRS/ENS Paris File Attached | ||
11:20 - 11:40 Talk | DelayRepay: Delayed Execution for Kernel Fusion in Python DLS 2020 John Magnus MortonUniversity of Edinburgh, Kuba KaszykUniversity of Edinburgh, Lu LiLinköping University, Jiawen SunUniversity of Edinburgh, Christophe DubachMcGill University, Michel SteuwerThe University of Edinburgh, Murray ColeUniversity of Edinburgh, UK, Michael F. P. O'BoyleUniversity of Edinburgh Link to publication DOI Pre-print Media Attached | ||
11:40 - 12:00 Research paper | Stratified Guarded First-order Transition Systems SAS Christian MüllerTechnische Universität München, Saarland University, Helmut SeidlTechnische Universität München File Attached | ||
12:00 - 12:20 Talk | Sampling Optimized Code for Type Feedback DLS 2020 Olivier FlückigerNortheastern University, Andreas WälchliUniversity of Bern, Sebastián KrynskiCzech Technical University, National University of Quilmes, Jan VitekNortheastern University / Czech Technical University Link to publication DOI Pre-print Media Attached |
23:00 - 23:20 Research paper | Interprocedural Shape Analysis Using Separation Logic-based Transformer Summaries SAS Hugo IllousCEA & INRIA / ENS Paris, Matthieu LemerreCEA LIST, France, Xavier RivalINRIA/CNRS/ENS Paris File Attached | ||
23:20 - 23:40 Talk | DelayRepay: Delayed Execution for Kernel Fusion in Python DLS 2020 John Magnus MortonUniversity of Edinburgh, Kuba KaszykUniversity of Edinburgh, Lu LiLinköping University, Jiawen SunUniversity of Edinburgh, Christophe DubachMcGill University, Michel SteuwerThe University of Edinburgh, Murray ColeUniversity of Edinburgh, UK, Michael F. P. O'BoyleUniversity of Edinburgh Link to publication DOI Pre-print Media Attached | ||
23:40 - 00:00 Research paper | Stratified Guarded First-order Transition Systems SAS Christian MüllerTechnische Universität München, Saarland University, Helmut SeidlTechnische Universität München File Attached | ||
00:00 - 00:20 Talk | Sampling Optimized Code for Type Feedback DLS 2020 Olivier FlückigerNortheastern University, Andreas WälchliUniversity of Bern, Sebastián KrynskiCzech Technical University, National University of Quilmes, Jan VitekNortheastern University / Czech Technical University Link to publication DOI Pre-print Media Attached |