When Less is More: On the Value of ''Co-training'' for Semi-Supervised Software Defect Predictors
Abstract Labeling a module defective or non-defective is an expensive task. Hence, there are often limits on how much-labeled data is available for training. Semi-supervised classifiers, use far fewer labels for training models but there are numerous semi-supervised methods including self-labeling, co-training, maximal-margin, graph based methods, to name a few. Only a handful of these methods has been tested in SE for (e.g.) predicting defects– and even that, those tests have been on just a handful of projects.
This paper takes a wide range of 55 semi-supervised learners and applies these to over 714 projects. We find that semi-supervised “co-training methods” work significantly better than other approaches. However, co-training need to be used with caution since the specific choice of co-training methods need to be carefully selected based on a user’s specific goals. Also, we warn that a commonly-used co-training method (“multi-view”– where different learners get different sets of columns) does not improve predictions (while adding too much to the run time costs 11 hours vs 1.8 hours).
Those cautions stated, we find using these “co-trainers”, we can label just 2.5% of data, then make predictions that are competitive to those using 100% of the data. It is an open question worthy of future work to test if these reductions can be seen in other areas of software analytics.
All the codes used and datasets analysed during the current study are available in the https://GitHub.com/Suvodeep90/Semi_Supervised_Methods.
Fri 19 AprDisplayed time zone: Lisbon change
14:00 - 15:30 | Testing with and for AI 2Journal-first Papers / Research Track / Demonstrations at Sophia de Mello Breyner Andresen Chair(s): João Pascoal Faria Faculty of Engineering, University of Porto and INESC TEC | ||
14:00 15mTalk | Large Language Models are Edge-Case Generators: Crafting Unusual Programs for Fuzzing Deep Learning Libraries Research Track Yinlin Deng University of Illinois at Urbana-Champaign, Chunqiu Steven Xia University of Illinois at Urbana-Champaign, Chenyuan Yang University of Illinois at Urbana-Champaign, Shizhuo Zhang University of Illinois Urbana-Champaign, Shujing Yang University of Illinois Urbana-Champaign, Lingming Zhang University of Illinois at Urbana-Champaign | ||
14:15 15mTalk | Deeply Reinforcing Android GUI Testing with Deep Reinforcement Learning Research Track Yuanhong Lan Nanjing University, Yifei Lu Nanjing University, Zhong Li , Minxue Pan Nanjing University, Wenhua Yang Nanjing University of Aeronautics and Astronautics, Tian Zhang Nanjing University, Xuandong Li Nanjing University | ||
14:30 7mTalk | Black-Box Testing of Deep Neural Networks through Test Case Diversity Journal-first Papers Zohreh Aghababaeyan University of Ottawa Ottawa, Ontario, Canada, Manel Abdellatif Software and Information Technology Engineering Department, École de Technologie Supérieure, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland, Ramesh S , Mojtaba Bagherzadeh Cisco | ||
14:37 7mTalk | scenoRITA: Generating Diverse, Fully Mutable, Test Scenarios for Autonomous Vehicle Planning Journal-first Papers Yuqi Huai University of California, Irvine, Sumaya Almanee University of California, Irvine, Yuntianyi Chen University of California, Irvine, Xiafa Wu University of California, Irvine, Qi Alfred Chen University of California, Irvine, Joshua Garcia University of California, Irvine | ||
14:44 7mTalk | InterEvo-TR: Interactive Evolutionary Test Generation with Readability Assessment Journal-first Papers Pedro Delgado-Pérez Universidad de Cádiz, Aurora Ramírez University of Córdoba, Kevin Jesús Valle-Gómez Universidad de Cádiz, Inmaculada Medina-Bulo Universidad de Cádiz, José Raúl Romero University of Cordoba, Spain | ||
14:51 7mTalk | Differential testing for machine learning: an analysis for classification algorithms beyond deep learning Journal-first Papers | ||
14:58 7mTalk | Journal First Article: "Syntactic Vs. Semantic similarity of Artificial and Real Faults in Mutation Testing Studies" Journal-first Papers Milos Ojdanic University of Luxembourg, Aayush Garg Luxembourg Institute of Science and Technology, Ahmed Khanfir University of Luxembourg, Renzo Degiovanni Luxembourg Institute of Science and Technology, Mike Papadakis University of Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg | ||
15:05 7mTalk | Causality-driven Testing of Autonomous Driving Systems Journal-first Papers Luca Giamattei Università di Napoli Federico II, Antonio Guerriero Università di Napoli Federico II, Roberto Pietrantuono Università di Napoli Federico II, Stefano Russo Università di Napoli Federico II | ||
15:12 7mTalk | When Less is More: On the Value of ''Co-training'' for Semi-Supervised Software Defect Predictors Journal-first Papers Suvodeep Majumder North Carolina State University, Joymallya Chakraborty Amazon.com, Tim Menzies North Carolina State University Pre-print | ||
15:19 7mTalk | OpenSBT: A Modular Framework for Search-based Testing of Automated Driving Systems Demonstrations Lev Sorokin fortiss, Tiziano Munaro fortiss, Damir Safin fortiss, Brian Hsuan-Cheng Liao DENSO AUTOMOTIVE, Adam Molin DENSO AUTOMOTIVE |