Extended Reality Cybersickness Assessment via User Review Analysis
In recent years, the extended reality (XR) software ecosystem has emerged as the next ubiquitous computing platform, as they provide users with immersive interactive experiences. However, XR ecosystem suffers from cybersickness problems, which would greatly affect user comfort and safety, leading to symptoms like headaches, disorientation, etc. That makes effective cybersickness assessment a timely and important question. The state-of-the-art methods of assessing the cybersickness of XR software typically monitor the biological metrics of users, during XR usage, which rely heavily on manual playtesting,sufferring from limited scalability issue. User reviews on XR app stores are informative for developers to learn the cybersickness ratings of their apps and the reasons behind. Nevertheless, the large number of user reviews can hardly be analyzed by developers manually, and most current automatic user review analysis methods can only provide coarse-grained or abstract results, such as extracting several high-level key topic groups discussed by reviews. Recent advancements in Large Language Models (LLMs) may bring new opportunities. However, directly leveraging LLMs for evaluating XR cybersickness is challenging because LLMs perform poorly on a large number of short texts, and their context window is limited. To address these challenges, we introduce XRCare, a comprehensive framework designed to automate the assessment of cybersickness and root cause reasoning for XR apps by resorting to fine-grained user review analysis. XRCare mainly includes three phases: (1) Insight pool construction, when XRCare collects the cybersickness analyzing chains and corresponding analyzing results from domain experts; (2) Reasoning graph construction, when XRCare dynamically extracts, categorizes, and maintains the reasons from user reviews that make users feel cybersickness on a self-evolving hierarchical graph; and (3) Multi-agent deductive cybersickness reasoning, which utilizes a multi-agent system to simulate diverse user demographics for analyzing and rating the intensity of cybersickness, as well as the causes of cybersickness. This structured approach allows XRCare to systematically identify, categorize, and address instances of cybersickness. For experiments, we construct a large-scale dataset consisting of 685,111 user reviews from 9,667 XR apps. Our evaluation shows that XRCare enhances the F1-score by 17.46% over the best-performing baseline and by 28.03% on average across all baselines, while also offering more accurate and detailed interpretability insights.