Deep Just-in-Time Defect Prediction: How Far Are We?
Sat 17 Jul 2021 10:30 - 10:50 at ISSTA 1 - Session 27 (time band 3) Bugs and Analysis 2 Chair(s): Mike Papadakis
Defect prediction aims to automatically identify potential defective code with minimal human intervention and has been widely studied in the literature. Just-in-Time (JIT) defect prediction focuses on program changes rather than whole programs, and has been widely adopted in continuous testing. CC2Vec, state-of-the-art JIT defect prediction tool, first constructs a hierarchical attention network (HAN) to learn distributed vector representations of both code additions and deletions, and then concatenates them with two other embedding vectors representing commit messages and overall code changes extracted by the existing DeepJIT approach to train a model for predicting whether a given commit is defective. Although CC2Vec has been shown to be the state of the art for JIT defect prediction, it was only evaluated on a limited dataset and not compared with all representative baselines. Therefore, to further investigate the efficacy and limitations of CC2Vec, this paper performs an extensive study of CC2Vec on a large-scale dataset with over 310,370 changes (8.3 X larger than the original CC2Vec dataset). More specifically, we also empirically compare CC2Vec against DeepJIT and representative traditional JIT defect prediction techniques. The experimental results show that CC2Vec cannot consistently outperform DeepJIT, and neither of them can consistently outperform traditional JIT defect prediction. We also investigate the impact of individual traditional defect prediction features and find that the added-line-number feature outperforms other traditional features. Inspired by this finding, we construct a simplistic JIT defect prediction approach which simply adopts the added-line-number feature with the logistic regression classifier. Surprisingly, such a simplistic approach can outperform CC2Vec and DeepJIT in defect prediction, and can be 81k X/120k X faster in training/testing. Furthermore, the paper also provides various practical guidelines for advancing JIT defect prediction in the near future.
Thu 15 JulDisplayed time zone: Brussels, Copenhagen, Madrid, Paris change
19:00 - 20:20 | Session 11 (time band 1) Machine Learning and TestingTechnical Papers at ISSTA 1 Chair(s): August Shi University of Texas at Austin | ||
19:00 20mTalk | Interval Constraint-Based Mutation Testing of Numerical Specifications Technical Papers Clothilde Jeangoudoux MPI-SWS, Eva Darulova MPI-SWS, Christoph Lauter University of Alaska at Anchorage DOI | ||
19:20 20mTalk | Predoo: Precision Testing of Deep Learning Operators Technical Papers Xufan Zhang Nanjing University, Ning Sun Nanjing University, Chunrong Fang Nanjing University, Jiawei Liu Nanjing University, Jia Liu Nanjing University, Dong Chai Huawei, Jiang Wang Huawei, Zhenyu Chen Nanjing University DOI | ||
19:40 20mTalk | TERA: Optimizing Stochastic Regression Tests in Machine Learning Projects Technical Papers Saikat Dutta University of Illinois at Urbana-Champaign, Jeeva Selvam University of Illinois at Urbana-Champaign, Aryaman Jain University of Illinois at Urbana-Champaign, Sasa Misailovic University of Illinois at Urbana-Champaign DOI | ||
20:00 20mTalk | Deep Just-in-Time Defect Prediction: How Far Are We? Technical Papers Zhengran Zeng Southern University of Science and Technology, Yuqun Zhang Southern University of Science and Technology, Haotian Zhang Kwai, Lingming Zhang University of Illinois at Urbana-Champaign DOI |
Sat 17 JulDisplayed time zone: Brussels, Copenhagen, Madrid, Paris change
09:30 - 11:10 | Session 27 (time band 3) Bugs and Analysis 2Technical Papers at ISSTA 1 Chair(s): Mike Papadakis University of Luxembourg, Luxembourg | ||
09:30 20mTalk | Faster, Deeper, Easier: Crowdsourcing Diagnosis of Microservice Kernel Failure from User Space Technical Papers Yicheng Pan Peking University, Meng Ma Peking University, Xinrui Jiang Peking University, Ping Wang Peking University DOI Media Attached File Attached | ||
09:50 20mTalk | Finding Data Compatibility Bugs with JSON Subschema CheckingDistinguished Artifact Technical Papers Andrew Habib SnT, University of Luxembourg, Avraham Shinnar IBM Research, Martin Hirzel IBM Research, Michael Pradel University of Stuttgart Link to publication DOI Pre-print File Attached | ||
10:10 20mTalk | Semantic Table Structure Identification in Spreadsheets Technical Papers Yakun Zhang Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Xiao Lv Microsoft Research, Haoyu Dong Microsoft Research, Wensheng Dou Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Shi Han Microsoft Research, Dongmei Zhang Microsoft Research, Jun Wei Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Dan Ye Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences DOI Media Attached | ||
10:30 20mTalk | Deep Just-in-Time Defect Prediction: How Far Are We? Technical Papers Zhengran Zeng Southern University of Science and Technology, Yuqun Zhang Southern University of Science and Technology, Haotian Zhang Kwai, Lingming Zhang University of Illinois at Urbana-Champaign DOI | ||
10:50 20mTalk | Continuous Test Suite Failure Prediction Technical Papers DOI Media Attached |