Enhancing LLM to Decompile Optimized PTX to Readable CUDA for Tensor Programs
This program is tentative and subject to change.
The growing demand for high-performance tensor programs on GPUs, especially for large language models (LLMs), necessitates advanced compilation and optimization techniques. However, the critical task of analyzing optimized, low-level PTX code for performance tuning or understanding poses significant challenges. While LLMs hold promise for PTX-to-CUDA decompilation to improve code intelligibility, their effectiveness is severely limited by the scarcity of aligned training data and the inherent complexity of highly optimized, unrolled PTX code.
In this work, we explore methodologies to significantly enhance LLM capabilities for accurate and readable PTX-to-CUDA decompilation and present PtxDec, a decompilation prototype implementing our approach. To overcome the critical barrier of data scarcity, we develop a compiler-based data augmentation framework coupled with rigorous post-processing, enabling the creation of a large-scale, high-quality dataset of 400K aligned CUDA-PTX kernel pairs for effective LLM training. Furthermore, to empower LLMs to handle the complexity of optimized PTX, we introduce Rolled-PTX—an intermediate representation generated through heuristic loop rerolling during preprocessing. Rolled-PTX condenses unrolled patterns, drastically simplifying the input structure presented to the LLM and aligning it better with higher-level loop constructs.
Comprehensive evaluation demonstrates that PtxDec achieves substantial performance gains: our approach yields a 2.3×–3.1× improvement in functional accuracy over baseline methods, alongside significant enhancements in generated code readability and scheduling consistency with the original optimized kernels. Ablation studies further validate the contribution of each proposed component to the overall performance.
To the best of our knowledge, this is the first work tackling PTX-to-CUDA decompilation, specifically focusing on and demonstrating effective strategies for augmenting LLMs to overcome the key challenges in this domain.
This program is tentative and subject to change.
Mon 17 NovDisplayed time zone: Seoul change
14:00 - 15:30 | |||
14:00 10mTalk | Enhancing LLM to Decompile Optimized PTX to Readable CUDA for Tensor Programs Research Papers Xinyu Sun University of Science and Technology of China, Fugen Tang University of Science and Technology of China, Yu Zhang University of Science and Technology of China, Han Shen Kuaishou Technology, Chengru Song Kuaishou Technology, Di Zhang Kuaishou Technology | ||
14:10 10mTalk | Forcrat: Automatic I/O API Translation from C to Rust via Origin and Capability Analysis Research Papers | ||
14:20 10mTalk | Polyglot: An Extensible Framework to Benchmark Code Translation with LLMs Research Papers Marco Vieira University of North Carolina at Charlotte, Priyam Ashish Shah University of North Carolina at Charlotte, Bhavain Shah University of North Carolina at Charlotte, Rrezarta Krasniqi University of North Carolina at Charlotte | ||
14:30 10mTalk | RFCScope: Detecting Logical Ambiguities in Internet Protocol Specifications Research Papers Mrigank Pawagi Indian Institute of Science, Bengaluru, Lize Shao Rice University, USA, Hyeonmin Lee University of Virginia, Yixin Sun University of Virginia, Wenxi Wang University of Virgina | ||
14:40 10mTalk | Vision to Specification: Automating the Transition from Conceptual Features to Functional Requirements Journal-First Track Xiaoli Lian Beihang University, China | ||
14:50 10mTalk | RustAssure: Differential Symbolic Testing for LLM-Transpiled C-to-Rust Code Research Papers | ||
15:00 10mTalk | SPEC2CODE: Mapping Software Specification to Function-Level Code Implementation Research Papers Yuekun Wang Singapore Management University, Lili Quan Tianjin University, Xiaofei Xie Singapore Management University, Junjie Wang Tianjin University, Jianjun Chen Tsinghua University | ||
15:10 10mTalk | RustRepoTrans: Repository-level Context Code Translation Benchmark Targeting Rust Research Papers Guangsheng Ou Sun Yat-sen University, Mingwei Liu Sun Yat-Sen University, Yuxuan Chen , Yanlin Wang Sun Yat-sen University, Xin Peng Fudan University, Zibin Zheng Sun Yat-sen University Pre-print | ||
15:20 10mTalk | DLBENCH: A Comprehensive Benchmark for SQL Translation with Large Language Models Research Papers Li Lin Xiamen University, Hongqiao Chen School of Informatics, Xiamen University, Qinglin Zhu School of Informatics, Xiamen University, Liehang Chen School of Informatics, Xiamen University, Linlong Tang School of Informatics, Xiamen University, Rongxin Wu Xiamen University | ||