Python Programmers have GPUs too: Automatic Python Loop Parallelization with Staged Dependence Analysis
Python is a popular language for end-user software development in many application domains. End-users want to harness parallel compute resources effectively, by exploiting commodity manycore technology including GPUs. However, existing approaches to parallelism in Python are esoteric, and generally seem too complex for the typical end-user developer. We argue that implicit, or automatic, parallelization is the best way to deliver the benefits of manycore to end-users, since it avoids domain-specific languages, specialist libraries, complex annotations or restrictive language subsets. Auto-parallelization fits the Python philosophy, provides effective performance, and is convenient for non-expert developers.
Despite being a dynamic language, we show that Python is a suitable target for auto-parallelization given that its semantics are simpler than traditional imperative languages like C and Fortran. In an empirical study of 3000+ open-source Python notebooks, we demonstrate that typical loop behaviour ‘in the wild’ is amenable to auto-parallelization. We show that staging the dependence analysis is an effective way to maximize performance. We apply classical techniques for static dependence analysis, then leverage the rich introspection capabilities of the Python runtime to resolve additional loop bounds and variable types in a just-in-time manner. The parallel loop nest code is then converted to CUDA kernels for GPU execution. We achieve orders of magnitude speedup over baseline interpreted execution and some speedup (up to 50x, although not consistently) over CPU JIT-compiled execution, across 12 loop-intensive standard benchmarks.
Sun 20 Oct
|14:00 - 14:30|
|14:30 - 15:00|
Olivier FlückigerNortheastern University, Guido ChariCzech Technical University, Jan JecmenCzech Technical University, Ming-Ho YeeNortheastern University, Jakob HainNortheastern University, Jan VitekNortheastern UniversityLink to publication DOI Pre-print Media Attached
|15:00 - 15:30|
Python Programmers have GPUs too: Automatic Python Loop Parallelization with Staged Dependence AnalysisResearch Paper
Dejice JacobUniversity of Glasgow, Phil TrinderUniversity of Glasgow, Jeremy SingerUniversity of GlasgowLink to publication DOI Authorizer link