Cross-ISA Machine Instrumentation Using Fast and Scalable Dynamic Binary Translation
The rise in instruction set architecture (ISA) diversity and the growing
adoption of virtual machines are driving a need for fast, scalable,
full-system, cross-ISA emulation and instrumentation tools. Unfortunately,
achieving high performance for these cross-ISA tools is challenging due to
dynamic binary translation (DBT) overhead and the complexity of instrumenting
full-system emulators.
In this paper we improve cross-ISA emulation and instrumentation performance
through three novel techniques. First, we increase floating point (FP)
emulation performance by observing that most FP operations can be correctly
emulated by surrounding the use of the host FP unit with a minimal amount
of non-FP code. Second, we introduce the design of a
translator with a shared code cache that scales for multi-core guests, even
when they generate translated code in parallel at a high rate. Third, we present
an ISA-agnostic instrumentation layer that can instrument guest operations
that occur outside of the DBT's intermediate representation (IR), which are
common in full-system emulators.
We implement our approach in Qelt, a high-performance cross-ISA machine emulator
and instrumentation tool based on QEMU. Our results show that Qelt scales
to 32 cores when emulating a guest machine used for parallel compilation,
which demonstrates scalable code translation. Furthermore, experiments
based on SPEC06 show that Qelt (1) outperforms QEMU as a full-system cross-ISA
machine emulator by $1.76\times$/$2.18\times$ for integer/FP workloads,
(2) outperforms state-of-the-art, cross-ISA, full-system instrumentation
tools by $1.5\times$-$3\times$, and (3) can match the performance of Pin, a
state-of-the-art, same-ISA DBI tool, when used for complex instrumentation such
as cache simulation.
Sun 14 AprDisplayed time zone: Eastern Time (US & Canada) change
13:30 - 15:35 | |||
13:30 25mTalk | Cross-ISA Machine Instrumentation Using Fast and Scalable Dynamic Binary Translation Research Papers | ||
13:55 25mTalk | The Janus Triad: Exploiting Parallelism through Dynamic Binary Modification Research Papers Ruoyu Zhou University of Cambridge, UK, George Wort University of Cambridge, UK, Marton Erdos University of Cambridge, UK, Timothy M. Jones University of Cambridge, UK | ||
14:20 25mTalk | Mitigating JIT Compilation Latency in Virtual Execution Environments Research Papers Martin Kristien University of Edinburgh, UK, Tom Spink University of Edinburgh, Harry Wagstaff University of Edinburgh, UK, Björn Franke University of Edinburgh, UK, Igor Böhm Synopsys, Austria, Nigel Topham University of Edinburgh, UK | ||
14:45 25mTalk | ScissorGC: Scalable and Efficient Compaction for Java Full Garbage Collection Research Papers Haoyu Li Shanghai Jiao Tong University, China, Mingyu Wu Shanghai Jiao Tong University, China, Binyu Zang Shanghai Jiao Tong University, China, Haibo Chen Shanghai Jiao Tong University, China | ||
15:10 25mTalk | Stochastic Resource Allocation Research Papers Liran Funaro Technion, Israel, Orna Agmon Ben-Yehuda Technion, Israel, Assaf Schuster Technion, Israel |