Write a Blog >>
LCTES 2017
Wed 21 - Thu 22 June 2017 Barcelona, Spain
co-located with PLDI 2017

More and more modern processors have been supporting non-contiguous SIMD data accesses. However, translating such instructions has been overlooked in the Dynamic Binary Translation (DBT) area. For example, in the popular QEMU dynamic binary translator, guest memory instructions with strides are emulated by a sequence of scalar instructions, leaving a significant room for performance improvement when the host machines have SIMD instructions available. Structured loads/stores, such as VLDn/VSTn in ARM NEON, are one type of strided SIMD data access instructions. They are widely used in signal processing, multimedia, mathematical and 2D matrix transposition applications. Efficient translation of such structured loads/stores is a critical issue when migrating ARM executables to other ISAs. However, it is quite challenging since not only the translation of structured loads/stores is not trivial, but also the difference between guest and host register configurations must be taken into consideration. In this work, we present the design and implementation of translating structured loads/stores in DBT, including target code generation as well as efficient SIMD register mapping. Our proposed register mapping mechanisms are not limited to handling structured load/stores, they can be extended to deal with normal SIMD instructions. On a set of OpenCV benchmarks, our QEMU-based system has achieved a maximum speedup of 5.41x, with an average improvement of 2.93x. On a set of BLAS benchmarks, our system has also obtained a maximum speedup of 2.19x and an average improvement of 1.63x.

Wed 21 Jun

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

10:50 - 12:30
Session 1: Compiler Optimization for Embedded SystemsLCTES 2017 at Vertex WS208
Chair(s): Yi Wang Shenzhen University
10:50
25m
Talk
AOT Vs. JIT: Impact of Profile Data on Code Quality
LCTES 2017
April W. Wade University of Kansas, Prasad Kulkarni University of Kansas, Michael Jantz University of Tennessee
11:15
25m
Talk
Adaptive Optimization for OpenCL Programs on Embedded Heterogeneous Systems
LCTES 2017
Ben Taylor Lancaster University, UK, Vicent Sanz Marco Lancaster University, Zheng Wang Lancaster University
11:40
25m
Talk
Auto-Vectorization for Image Processing DSLs
LCTES 2017
Oliver Reiche Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Christof Kobylko , Frank Hannig Friedrich-Alexander University Erlangen-Nürnberg (FAU), Jürgen Teich
12:05
25m
Talk
Dynamic Translation of Structured Loads/Stores and Register Mapping for Architectures with SIMD Extensions
LCTES 2017
Sheng-Yu Fu , Ding-Yong Hong Institute of Information Science, Academia Sinica, Ping Yu Department of Computer Science and Information Engineering, National Taiwan University, Jan-Jan Wu Institute of Information Science, Academia Sinica, Wei-Chung Hsu Dept. Computer Science & Information Engineering, National Taiwan University