SmartDispatch: Dynamic Substitution of NumPy-style APIs on Heterogenous CPU-GPU Systems
The popularity of Python in various application domains has driven widespread adoption of NumPy-style APIs. To accelerate performance, libraries such as PyTorch, JAX, CuPy, and cuPyNumeric offer GPU-compatible counterparts to NumPy functions. However, substituting NumPy with these alternatives is not always beneficial due to overheads from type conversion, data transfer, and kernel launch costs. We present SmartDispatch, a runtime framework that dynamically substitutes NumPy-style API calls with semantically equivalent implementations from other libraries to improve performance. Our system includes a knowledge base of equivalent APIs, a hardware-aware microbenchmarking component to identify substitution thresholds, and a runtime substitution tool. Evaluation on four platforms with varying CPU-GPU architectures using machine learning models from real-world benchmarks shows that consistent performance gains (1.3× to 5.8×) can be achieved without requiring code modification, demonstrating the effectiveness of cross-library substitution in heterogeneous environments.