VLATest: Testing and Evaluating Vision-Language-Action Models for Robotic Manipulation
The rapid advancement of generative AI and multi-modal foundation models has shown significant potential in advancing robotic manipulation. Vision-language-action (VLA) models, in particular, have emerged as a promising approach for visuomotor control by leveraging large-scale vision-language data and robot demonstrations. However, current VLA models are typically evaluated using a limited set of hand-crafted scenes, leaving their general performance and robustness in diverse scenarios largely unexplored. To address this gap, we present VLATest, a fuzzing framework designed to generate robotic manipulation scenes for testing VLA models. Based on VLATest, we conducted an empirical study to assess the performance of seven representative VLA models. Our study results revealed that current VLA models lack the robustness necessary for practical deployment. Additionally, we investigated the impact of various factors, including the number of obstacles, lighting conditions, camera poses, and unseen objects, on the VLA model’s performance. Our findings highlight the limitations of existing VLA models, emphasizing the need for further research to develop reliable and trustworthy VLA applications.
Mon 23 JunDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
16:00 - 18:00 | Testing 2Journal First / Research Papers at Cosmos 3A Chair(s): Miryung Kim UCLA and Amazon Web Services | ||
16:00 20mTalk | Search-based DNN Testing and Retraining with GAN-enhanced Simulations Journal First Mohammed Attaoui University of Luxembourg, Fabrizio Pastore University of Luxembourg, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland | ||
16:20 20mTalk | TEASMA: A Practical Methodology for Test Adequacy Assessment of Deep Neural Networks Journal First Amin Abbasishahkoo The School of EECS, University of Ottawa, Mahboubeh Dadkhah University of Ottawa, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland, Dayi Lin Centre for Software Excellence, Huawei Canada | ||
16:40 20mTalk | VLATest: Testing and Evaluating Vision-Language-Action Models for Robotic Manipulation Research Papers Zhijie Wang University of Alberta, Zhehua Zhou University of Alberta, Canada, Jiayang Song University of Alberta, Yuheng Huang The University of Tokyo, Zhan Shu University of Alberta, Lei Ma The University of Tokyo & University of Alberta DOI Pre-print | ||
17:00 20mTalk | DRWASI: LLM-assisted Differential Testing for WebAssembly System Interface Implementations Journal First Yixuan Zhang Peking University, Ningyu He Hong Kong Polytechnic University, Jianting Gao Huazhong University of Science and Technology, Shangtong Cao Beijing University of Posts and Telecommunications, Kaibo Liu Peking University, Haoyu Wang Huazhong University of Science and Technology, Yun Ma Peking University, Gang Huang Peking University, Xuanzhe Liu Peking University | ||
17:20 20mTalk | MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases Journal First Congying Xu The Hong Kong University of Science and Technology, China, Valerio Terragni University of Auckland, Hengcheng Zhu The Hong Kong University of Science and Technology, Jiarong Wu , Shing-Chi Cheung Hong Kong University of Science and Technology | ||
17:40 20mTalk | UnitCon: Synthesizing Targeted Unit Tests for Java Runtime Exceptions Research Papers Sujin Jang KAIST, Yeonhee Ryou KAIST, Heewon Lee KAIST, Korea, South (The Republic of), Kihong Heo KAIST DOI |
Cosmos 3A is the first room in the Cosmos 3 wing.
When facing the main Cosmos Hall, access to the Cosmos 3 wing is on the left, close to the stairs. The area is accessed through a large door with the number “3”, which will stay open during the event.