Automated Bug Frame Retrieval from Gameplay Videos Using Vision-Language Models
Modern game studios deliver new builds and patches at a rapid pace, generating thousands of bug reports, many of which embed gameplay videos. To verify and triage these bug reports, developers must watch the submitted videos. This manual review is labour-intensive, slow, and hard to scale. In this paper, we introduce an automated pipeline that reduces each video to a single frame that best matches the reported bug description, giving developers instant visual evidence that pinpoints the bug.
Our pipeline begins with \textit{FFmpeg} for keyframe extraction, reducing each video to a median of just 1.90% of its original frames while still capturing bug moments in 98.79% of cases. These keyframes are then evaluated by a vision–language model (GPT-4o), which ranks them based on how well they match the textual bug description and selects the most representative frame. We evaluated this approach using real-world developer-submitted gameplay videos and JIRA bug reports from a popular First-Person Shooter (FPS) game. The pipeline achieves an overall F1 score of 0.79 and Success of 0.89 for the top-1 retrieved frame. Performance is highest for the Lighting & Shadow (F1 = 0.94), Physics & Collision (0.86), and UI & HUD (0.83) bug categories, and lowest for Animation & VFX (0.51).
By replacing video viewing with an immediately informative image, our approach dramatically reduces manual effort and speeds up triage and regression checks, offering practical benefits to quality assurance (QA) teams and developers across the game industry.
Wed 15 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
11:00 - 12:30 | AI for Software Engineering 3SE In Practice (SEIP) at Europa II Chair(s): Eric Bodden Heinz Nixdorf Institute at Paderborn University & Fraunhofer IEM | ||
11:00 15mTalk | Agentic Memory Enhanced Recursive Reasoning for Root Cause Localization in Microservices SE In Practice (SEIP) Lingzhe Zhang Peking University, China, Tong Jia Institute for Artificial Intelligence, Peking University, Beijing, China, Yunpeng Zhai Alibaba Group, Leyi Pan Tsinghua University, Chiming Duan Peking University, Minghua He Peking University, Mengxi Jia Institute of Artificial Intelligence, China Telecom, Ying Li School of Software and Microelectronics, Peking University, Beijing, China | ||
11:15 15mTalk | R-Log: Incentivizing Log Analysis Capability in LLMs via Reasoning-based Reinforcement Learning SE In Practice (SEIP) Yilun Liu Huawei co. LTD, Chen Ziang Huawei co. LTD; Nankai University, Song Xu Huawei co. LTD, Minggui He Huawei co. LTD, Shimin Tao University of Science and Technology of China; Huawei co. LTD, Weibin Meng Huawei co. LTD, Yuming Xie Huawei co. LTD, Tao Han Huawei co. LTD, Chunguang Zhao Huawei co. LTD, Jingzhou Du Huawei co. LTD, Daimeng Wei Huawei co. LTD, Shenglin Zhang Nankai University, Yongqian Sun Nankai University Media Attached | ||
11:30 15mTalk | LLM-Based Automated Diagnosis Of Integration Test Failures At Google SE In Practice (SEIP) Pre-print | ||
11:45 15mTalk | Automated Bug Frame Retrieval from Gameplay Videos Using Vision-Language Models SE In Practice (SEIP) Wentao Lu University of Alberta, Alexander Senchenko Electronic Arts, Abram Hindle University of Alberta, Cor-Paul Bezemer University of Alberta | ||
12:00 15mTalk | Finding the Needle in the Crash Stack: Industrial-Scale Crash Root Cause Localization with AutoCrashFL SE In Practice (SEIP) Sungmin Kang NUS, Sumi Yun SAP Labs Korea, Jingun Hong SAP Labs Korea, Shin Yoo KAIST, Gabin An Korea University Pre-print | ||
12:15 15mTalk | PerFrame: Monitoring GUI Loading Performance in Mobile Apps via Semantic Distinguish SE In Practice (SEIP) Jianing Liu Fudan University, Shiyu Guo , Yongxiang Hu Fudan University, Yu Zhang Meituan, Hailiang Jin Meituan Inc., Juxing Yuan Meituan Inc., Yangfan Zhou Fudan University, Xin Wang Fudan University Media Attached | ||