AstraGame: Effective and Efficient VLM Agent Serving for Large-Scale Game Testing in an Industry Setting
Automated game testing requires deep semantic understanding to navigate complex dynamic environments, yet directly applying end-to-end Vision-Language Models (VLMs) at an industrial scale incurs prohibitive latency and token costs with limited testing effectiveness. In this paper, we introduce AstraGame, a VLM-based game testing framework deployed at WeChat, one of the world’s largest mini-game platforms. To resolve the conflict between reasoning depth and execution speed, AstraGame employs a decoupled architecture that orchestrates a collaboration between specialized small models and large cognitive VLMs. Specifically, we propose UIBrain to parallelize perception and reasoning tasks for temporal saturation, UIBase to implement widget-level semantic caching that transforms repetitive inference into efficient lookup, and UIFormer to structure action protocols for token efficiency. Evaluation results show that AstraGame substantially improves exploration coverage by 37.78% compared to the state-of-the-practice approach. Extensive deployment across 24,000 unique mini-games demonstrates that AstraGame achieves a 58% reduction in end-to-end latency compared to sequential execution, increasing the proportion of games yielding valid and comprehensive auditing materials from 38% to 63%. To date, AstraGame has identified over 180,000 functional and compliance issues, validating that industrial-scale intelligence requires shifting focus from isolated VLM inference to holistic system orchestration.