Minuku: Detecting Diverse Display Issues in Mobile Apps with Small-scale Dataset
User interface (UI) display issues, such as widgets occlusion, missing elements, and screen overflow, are emerging as a non-negligible source of user complaints in commercial mobile apps. However, existing automated testing tools typically rely on a vast amount of high-quality training data, making them cost-ineffective for industrial practice. Given that display issues are intuitively recognizable by humans, their diverse appearances can be abstracted by the violation of human commonsense expectations of UI appearance. Therefore, this paper proposes to reduce data requirements in display issue detection through commonsense simulation. Although leveraging large vision-language models (VLMs) to replicate human visual ability looks straightforward, off-the-shelf VLMs lack task-specific knowledge of UI designs and display correctness. To address this, we fine-tune a VLM to learn what constitutes an expected display and to reason potential display issues. This approach is termed as Minuku, an industrial data-efficient UI display issue detector. We evaluate the design effectiveness of Minuku via a set of ablation experiments. Moreover, real-world deployments in one of the largest E-commerce app providers further demonstrate that Minuku can effectively detect 40 previously unknown UI display issues and significantly reduce manual effort in industrial settings.