SPICE : An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation
This program is tentative and subject to change.
High-quality labeled datasets are crucial for training and evaluating foundation models in software engineering, but creating them is often prohibitively expensive and labor-intensive. We introduce SPICE, a scalable, automated pipeline for labeling SWE-bench-style datasets with annotations for issue clarity, test coverage, and effort estimation. SPICE combines context-aware code navigation, rationale-driven prompting, and multi-pass consensus to produce labels that closely approximate expert annotations. SPICE’s design was informed by our own experience and frustration in labeling more than 800 tasks from SWE-Gym. SPICE achieves strong agreement with human-labeled SWE-bench Verified data while reducing the cost of labeling 1,000 instances from around $100,000 (manual annotation) to just $5.10. These results demonstrate SPICE’s potential to enable cost-effective, large-scale dataset creation for SE-focused FMs.
This program is tentative and subject to change.
Mon 17 NovDisplayed time zone: Seoul change
14:00 - 15:30 | |||
14:00 10mTalk | LAURA: Enhancing Code Review Generation with Context-Enriched Retrieval-Augmented LLM Research Papers Yuxin Zhang Beijing Institute of Technology, Yuxia Zhang Beijing Institute of Technology, Zeyu Sun Institute of Software, Chinese Academy of Sciences, Yanjie Jiang Peking University, Hui Liu Beijing Institute of Technology | ||
14:10 10mTalk | AlertGuardian: Intelligent Alert Life-Cycle Management for Large-scale Cloud Systems Research Papers Guangba Yu The Chinese University of Hong Kong, Genting Mai Sun Yat-sen University, Rui Wang Tencent, Ruipeng Li Tencent, Pengfei Chen Sun Yat-sen University, Long Pan Tencent, Ruijie Xu Tencent | ||
14:20 10mTalk | SPICE : An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation Research Papers Aaditya Bhatia Queen's University, Gustavo Oliva Centre for Software Excellence, Huawei Canada, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Haoxiang Zhang Huawei, Yihao Chen Center for Software Excellence, Huawei Canada, Zhilong Chen Center for Software Excellence, Huawei Canada, Arthur Leung Center for Software Excellence, Huawei Canada, Dayi Lin Centre for Software Excellence, Huawei Canada, Boyuan Chen Centre for Software Excellence, Huawei Canada, Ahmed E. Hassan Queen’s University | ||
14:30 10mTalk | Managing the variability of a logistics robotic system Journal-First Track | ||
14:40 10mTalk | Sprint2Vec: A Deep Characterization of Sprints in Iterative Software Development Journal-First Track Morakot Choetkiertikul Mahidol University, Thailand, Peerachai Banyongrakkul Mahidol University, Chaiyong Rakhitwetsagul Mahidol University, Thailand, Suppawong Tuarob Mahidol University, Hoa Khanh Dam University of Wollongong, Thanwadee Sunetnanta Mahidol University | ||
14:50 10mTalk | Supporting Emotional Intelligence, Productivity and Team Goals while Handling Software Requirements Changes Journal-First Track Kashumi Madampe Monash University, Australia, Rashina Hoda Monash University, John Grundy Monash University | ||
15:00 10mTalk | Rechecking Recheck Requests in Continuous Integration: An Empirical Study of OpenStack Research Papers Yelizaveta Brus University of Waterloo, Rungroj Maipradit University of Waterloo, Earl T. Barr University College London, Shane McIntosh University of Waterloo | ||
15:10 10mTalk | An LLM-based multi-agent framework for agile effort estimation Research Papers Long Bui University of Wollongong, Hoa Khanh Dam University of Wollongong, Rashina Hoda Monash University | ||
15:20 10mTalk | From Characters to Structure: Rethinking Real-Time Collaborative Programming Models Research Papers | ||