APLAS 2025
Mon 27 - Thu 30 October 2025 Bengaluru, India

Tensor program optimization is a non-convex optimization problem, and efficiently solving it while balancing optimization efficiency and execution performance remains a challenging task. Search-based tensor program compilers have proven effective by constructing large-scale exploration spaces that include potentially high-performance program variants, thus overcoming the performance bottlenecks of traditional program optimization methods. However, these approaches still face significant challenges in search strategies, as existing compilers often require hours or even days to identify the optimal program representation.This paper proposes ELTC, an end-to-end tensor program compilation framework based on large language models (LLMs), designed for efficient optimization of tensor programs in deep neural networks. ELTC formulates the tensor program exploration problem as a generation task for language models. By training a large language model offline, it generates transformation sequences for tensor programs in an end-to-end manner based on their feature representations. While preserving the broad search space, this approach significantly improves optimization efficiency. Moreover, we introduce a language-model-friendly intermediate representation, which encodes key features of tensor programs using structured textual formats. Based on this representation, we construct a tensor program dataset tailored for language models. Experimental results demonstrate that ELTC achieves superior performance in both optimization quality and tuning speed. Compared with the fully converged Ansor-TenSet, ELTC achieves a 34.07× compilation speedup and an average performance improvement of 1.06× under convergence conditions. Furthermore, ELTC outperforms the manually optimized kernel library TensorRT, achieving a 1.3× performance gain.

keywords: Program Transformation · Compiler · Large Language Models.