Python Programmers have GPUs too: Automatic Python Loop Parallelization with Staged Dependence Analysis (DLS 2019)

Who

Dejice Jacob, Phil Trinder, Jeremy Singer

Track

DLS 2019

Time Zone

The program is currently displayed in (GMT+03:00) Beirut.

Use conference time zone: (GMT+03:00) BeirutSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 20 Oct 2019 15:00 - 15:30 at Room 2A - Optimizing Computations Chair(s): Marc Feeley

Abstract

Python is a popular language for end-user software development in many application domains. End-users want to harness parallel compute resources effectively, by exploiting commodity manycore technology including GPUs. However, existing approaches to parallelism in Python are esoteric, and generally seem too complex for the typical end-user developer. We argue that implicit, or automatic, parallelization is the best way to deliver the benefits of manycore to end-users, since it avoids domain-specific languages, specialist libraries, complex annotations or restrictive language subsets. Auto-parallelization fits the Python philosophy, provides effective performance, and is convenient for non-expert developers.

Despite being a dynamic language, we show that Python is a suitable target for auto-parallelization given that its semantics are simpler than traditional imperative languages like C and Fortran. In an empirical study of 3000+ open-source Python notebooks, we demonstrate that typical loop behaviour ‘in the wild’ is amenable to auto-parallelization. We show that staging the dependence analysis is an effective way to maximize performance. We apply classical techniques for static dependence analysis, then leverage the rich introspection capabilities of the Python runtime to resolve additional loop bounds and variable types in a just-in-time manner. The parallel loop nest code is then converted to CUDA kernels for GPU execution. We achieve orders of magnitude speedup over baseline interpreted execution and some speedup (up to 50x, although not consistently) over CPU JIT-compiled execution, across 12 loop-intensive standard benchmarks.

Link to Publication

https://dl.acm.org/citation.cfm?doid=3359619.3359743

Authorizer Link

http://www.dcs.gla.ac.uk/~jacobd/ALPyNA_Python_Parallelization_DLS19.pdf

DOI

https://doi.org/10.1145/3359619.3359743

Dejice Jacob

University of Glasgow

Phil Trinder

University of Glasgow

Jeremy Singer

University of Glasgow

United Kingdom

Time Zone

The program is currently displayed in (GMT+03:00) Beirut.

Use conference time zone: (GMT+03:00) BeirutSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sun 20 Oct
Displayed time zone: Beirut change

14:00 - 15:30	Optimizing ComputationsDLS 2019 at Room 2A Chair(s): Marc Feeley Université de Montréal

14:00 30m Talk		Reflections on the Compatibility, Performance, and Scalability of Parallel PythonExperience Paper DLS 2019 Remigius Meier ETH Zurich, Switzerland, Thomas Gross ETH Zurich, Switzerland
14:30 30m Talk		R Melts Brains -- An IR for First-Class Environments and Lazy Effectful ArgumentsResearch Paper DLS 2019 Olivier Flückiger Northeastern University, Guido Chari Czech Technical University, Jan Ječmen Czech Technical University, Ming-Ho Yee Northeastern University, Jakob Hain Northeastern University, Jan Vitek Northeastern University Link to publication DOI Pre-print Media Attached
15:00 30m Talk		Python Programmers have GPUs too: Automatic Python Loop Parallelization with Staged Dependence AnalysisResearch Paper DLS 2019 Dejice Jacob University of Glasgow, Phil Trinder University of Glasgow, Jeremy Singer University of Glasgow Link to publication DOI Authorizer link