Practical Escape of Exploration Tarpits for Mini-Game Testing in an Industrial Setting
This program is tentative and subject to change.
Attracting over one billion registered users globally, WeChat’s mini-game platform has become one of the largest gaming platforms with more than one hundred thousand published mini-games. To ensure the quality of experiences across massive mini-games, automated UI testing has become essential for WeChat. However, sliding-gesture-induced exploration tarpits, states where testing tools become trapped in repetitive, unsuccessful gesture attempts, cause the testing tool to waste up to 98% of its testing budget due to their inability to execute proper sliding gestures. While mini-games typically contain visual hints (e.g., sliding indicators) guiding the desired sliding gestures, exploiting these hints for escaping exploration tarpits faces two major challenges in industrial settings: (1) robustness challenge when exploiting hints from only several discontinuous screenshots, and (2) efficiency challenge to support thousands of concurrent testing services with minimal overhead and costs. To address the preceding challenges, we report our experiences in developing and deploying SLIDESCOUT, a three-stage approach for detecting and escaping sliding-gesture-induced exploration tarpits via efficient exploitation of visual hints. First, SLIDESCOUT concurrently monitors the testing progress and detects sliding indicators alongside screenshot collection, improving efficiency by reusing preprocessed results in subsequent stages. Second, SLIDESCOUT reconstructs potential sliding trajectories using multiple heuristics, addressing robustness challenges when precise trajectories are unavailable due to dis- continuous screenshots. Third, SLIDESCOUT applies the inferred sliding gestures until successfully escaping the tarpit, enabling easy integration with existing testing tools. Deployed at WeChat internally for six months, SLIDESCOUT has helped reveal 25,000 crashes and 120,000 JavaScript errors, detecting 50% more crashes compared to the pre-deployment baseline within the same time period. We summarize three major lessons learned from developing and deploying SLIDESCOUT.