FSE 2026
Sun 5 - Thu 9 July 2026 Montreal, Canada

With the wide adoption of GPUs in high-performance computing, CUDA programming has become essential for leveraging GPU parallelism. However, its complex programming model poses challenges in performance optimization. Consequently, CUDA programs often suffer from performance problems. In that sense, it is crucial to understand the performance problems specific to CUDA programming. Unfortunately, no systematic study has been conducted in literature.

To bridge this gap, we conduct the first systematic study to 1) characterize the symptoms and root causes of 216 performance problems collected from 55 StackOverflow posts and 122 NVIDIA forum posts, and 2) measure the speedup of fixing performance problems, and assess the capability of existing performance analysis methods using a dataset of 69 reproduced performance problems. Our findings provide practical guidance for developers, and opportunities for researchers to advance performance analysis.