Auto-Vectorization for Image Processing DSLs
The parallelization of programs and distributing their workloads to multiple threads can be a challenging task. In addition to multi- threading, harnessing vector units in CPUs proves highly desirable. However, employing vector units to speed up programs can be quite tedious. Either a program developer solely relies on the auto-vectorization capabilities of the compiler or he manually applies vector intrinsics, which is extremely error-prone, difficult to maintain, and not portable at all. Based on whole-function vectorization, a method to replace con- trol flow with data flow, we propose auto-vectorization techniques for image processing DSLs in the context of source-to-source com- pilation. The approach does not require the input to be available in SSA form. Moreover, we formulate constraints under which the vectorization analysis and code transformations may be greatly sim- plified in the context of image processing DSLs. As part of our methodology, we present control flow to data flow transformation as a source-to-source translation. Moreover, we propose a method to efficiently analyze algorithms with mixed bit-width data types to determine the optimal SIMD width, independently of the target instruction set. The techniques are integrated into an open source DSL framework. Subsequently, the vectorization capabilities are compared to a variety of existing state-of-the-art C/C ++ compilers. Speedups of up to 7.4 are observed for benchmarks taken from ISPC and image processing, compared to non-vectorized executions.
Wed 21 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
10:50 - 12:30 | Session 1: Compiler Optimization for Embedded SystemsLCTES 2017 at Vertex WS208 Chair(s): Yi Wang Shenzhen University | ||
10:50 25mTalk | AOT Vs. JIT: Impact of Profile Data on Code Quality LCTES 2017 April W. Wade University of Kansas, Prasad Kulkarni University of Kansas, Michael Jantz University of Tennessee | ||
11:15 25mTalk | Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems LCTES 2017 Ben Taylor Lancaster University, UK, Vicent Sanz Marco Lancaster University, Zheng Wang Lancaster University | ||
11:40 25mTalk | Auto-Vectorization for Image Processing DSLs LCTES 2017 Oliver Reiche Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Christof Kobylko , Frank Hannig Friedrich-Alexander University Erlangen-Nürnberg (FAU), Jürgen Teich | ||
12:05 25mTalk | Dynamic Translation of Structured Loads/Stores and Register Mapping for Architectures with SIMD Extensions LCTES 2017 Sheng-Yu Fu , Ding-Yong Hong Institute of Information Science, Academia Sinica, Ping Yu Department of Computer Science and Information Engineering, National Taiwan University, Jan-Jan Wu Institute of Information Science, Academia Sinica, Wei-Chung Hsu Dept. Computer Science & Information Engineering, National Taiwan University |