Enhancing LLM to Decompile Optimized PTX to Readable CUDA for Tensor Programs (ASE 2025 - Research Papers)

Who

Xinyu Sun, Fugen Tang, Yu Zhang, Han Shen, Chengru Song, Di Zhang

Track

ASE 2025 Research Papers

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 17 Nov 2025 14:00 - 14:10 at Grand Hall 2 - Translation

Abstract

The growing demand for high-performance tensor programs on GPUs, especially for large language models (LLMs), necessitates advanced compilation and optimization techniques. However, the critical task of analyzing optimized, low-level PTX code for performance tuning or understanding poses significant challenges. While LLMs hold promise for PTX-to-CUDA decompilation to improve code intelligibility, their effectiveness is severely limited by the scarcity of aligned training data and the inherent complexity of highly optimized, unrolled PTX code.

In this work, we explore methodologies to significantly enhance LLM capabilities for accurate and readable PTX-to-CUDA decompilation and present PtxDec, a decompilation prototype implementing our approach. To overcome the critical barrier of data scarcity, we develop a compiler-based data augmentation framework coupled with rigorous post-processing, enabling the creation of a large-scale, high-quality dataset of 400K aligned CUDA-PTX kernel pairs for effective LLM training. Furthermore, to empower LLMs to handle the complexity of optimized PTX, we introduce Rolled-PTX—an intermediate representation generated through heuristic loop rerolling during preprocessing. Rolled-PTX condenses unrolled patterns, drastically simplifying the input structure presented to the LLM and aligning it better with higher-level loop constructs.

Comprehensive evaluation demonstrates that PtxDec achieves substantial performance gains: our approach yields a 2.3×–3.1× improvement in functional accuracy over baseline methods, alongside significant enhancements in generated code readability and scheduling consistency with the original optimized kernels. Ablation studies further validate the contribution of each proposed component to the overall performance.

To the best of our knowledge, this is the first work tackling PTX-to-CUDA decompilation, specifically focusing on and demonstrating effective strategies for augmenting LLMs to overcome the key challenges in this domain.

Xinyu Sun

University of Science and Technology of China

Fugen Tang

University of Science and Technology of China

Yu Zhang

University of Science and Technology of China

Han Shen

Kuaishou Technology

Chengru Song

Kuaishou Technology

Di Zhang

Kuaishou Technology

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 17 Nov
Displayed time zone: Seoul change

14:00 - 15:30	TranslationResearch Papers / Journal-First Track at Grand Hall 2

14:00 10m Talk		Enhancing LLM to Decompile Optimized PTX to Readable CUDA for Tensor Programs Research Papers Xinyu Sun University of Science and Technology of China, Fugen Tang University of Science and Technology of China, Yu Zhang University of Science and Technology of China, Han Shen Kuaishou Technology, Chengru Song Kuaishou Technology, Di Zhang Kuaishou Technology
14:10 10m Talk		Forcrat: Automatic I/O API Translation from C to Rust via Origin and Capability Analysis Research Papers Jaemin Hong KAIST, Sukyoung Ryu KAIST
14:20 10m Talk		Polyglot: An Extensible Framework to Benchmark Code Translation with LLMs Research Papers Marco Vieira University of North Carolina at Charlotte, Priyam Ashish Shah University of North Carolina at Charlotte, Bhavain Shah University of North Carolina at Charlotte, Rrezarta Krasniqi University of North Carolina at Charlotte
14:30 10m Talk		RFCScope: Detecting Logical Ambiguities in Internet Protocol Specifications Research Papers Mrigank Pawagi Indian Institute of Science, Bengaluru, Lize Shao Rice University, USA, Hyeonmin Lee University of Virginia, Yixin Sun University of Virginia, Wenxi Wang University of Virgina
14:40 10m Talk		Vision to Specification: Automating the Transition from Conceptual Features to Functional Requirements Journal-First Track Xiaoli Lian Beihang University, China
14:50 10m Talk		RustAssure: Differential Symbolic Testing for LLM-Transpiled C-to-Rust Code Research Papers Yubo Bai University of California, Davis, Tapti Palit University of California, Davis
15:00 10m Talk		SPEC2CODE: Mapping Software Specification to Function-Level Code Implementation Research Papers Yuekun Wang Singapore Management University, Lili Quan Tianjin University, Xiaofei Xie Singapore Management University, Junjie Wang Tianjin University, Jianjun Chen Tsinghua University
15:10 10m Talk		RustRepoTrans: Repository-level Context Code Translation Benchmark Targeting Rust Research Papers Guangsheng Ou Sun Yat-sen University, Mingwei Liu Sun Yat-Sen University, Yuxuan Chen , Yanlin Wang Sun Yat-sen University, Xin Peng Fudan University, Zibin Zheng Sun Yat-sen University Pre-print
15:20 10m Talk		DLBENCH: A Comprehensive Benchmark for SQL Translation with Large Language Models Research Papers Li Lin Xiamen University, Hongqiao Chen School of Informatics, Xiamen University, Qinglin Zhu School of Informatics, Xiamen University, Liehang Chen School of Informatics, Xiamen University, Linlong Tang School of Informatics, Xiamen University, Rongxin Wu Xiamen University