RegMiner: Towards Constructing a Large Regression Dataset from Code Evolution History
Thu 21 Jul 2022 09:00 - 09:20 at ISSTA 1 - Session 2-9: Test Generation and Mutation D
Bug datasets lay significant empirical and experimental foundation for various SE/PL researches such as fault localization, software testing, and program repair. Current well-known datasets are constructed manually, which inevitably limits their scalability, representativeness, and the support for the emerging data-driven research.
In this work, we propose an approach to automate the process of harvesting replicable regression bugs from the code evolution history. We focus on regression bugs, as they (1) manifest how a bug is introduced and fixed (as non-regression bugs), (2) support regression bug analysis, and (3) incorporate more specification (i.e., both the original passing version and the fixing version) than non-regression bug dataset for bug analysis. Technically, we address an information retrieval problem on code evolution history. Given a code repository, we search for regressions where a test can pass a regression-fixing commit, fail a regression-inducing commit, and pass a previous working commit. In this work, we address the challenges of (1) identifying potential regression-fixing commits from the code evolution history, (2) migrating the test and its code dependencies over the history, and (3) minimizing the compilation overhead during the regression search. We build our tool, RegMiner, which harvested 1035 regressions over 147 projects in 8 weeks, creating the largest replicable regression dataset within the shortest period, to the best of our knowledge. Our extensive experiments show that (1) RegMiner can construct the regression dataset with very high precision and acceptable recall, and (2) the constructed regression dataset is of high authenticity and diversity. We foresee that a continuously growing regression dataset opens many data-driven research opportunities in the SE/PL communities.
Wed 20 JulDisplayed time zone: Seoul change
18:00 - 19:00 | Session 3-3: Test Generation and Mutation CTechnical Papers at ISSTA 1 Chair(s): Stefan Winter LMU Munich | ||
18:00 20mTalk | One Step Further: Evaluating Interpreters Using Metamorphic Testing Technical Papers Ming Fan Xi'an Jiaotong University, Jiali Wei Xi'an Jiaotong University, Wuxia Jin Xi'an Jiaotong University, Zhou Xu Wuhan University, Wenying Wei Xi'an Jiaotong University, Ting Liu Xi'an Jiaotong University DOI | ||
18:20 20mTalk | Test Mimicry to Assess the Exploitability of Library Vulnerabilities Technical Papers Hong Jin Kang Singapore Management University, Singapore, Truong Giang Nguyen School of Computing and Information Systems, Singapore Management University, Xuan Bach D. Le The University of Melbourne, Corina S. Pasareanu Carnegie Mellon University Silicon Valley, NASA Ames Research Center, David Lo Singapore Management University DOI | ||
18:40 20mTalk | RegMiner: Towards Constructing a Large Regression Dataset from Code Evolution History Technical Papers Xuezhi Song Fudan University, Yun Lin National University of Singapore, Siang Hwee Ng National University of Singapore, Yijian Wu Fudan University, Xin Peng Fudan University, Jin Song Dong National University of Singapore, Hong Mei Peking University DOI Pre-print |
Thu 21 JulDisplayed time zone: Seoul change
08:40 - 09:40 | |||
08:40 20mTalk | Finding Bugs in Gremlin-Based Graph Database Systems via Randomized Differential Testing Technical Papers Yingying Zheng Institute of Software Chinese Academy of Sciences, Wensheng Dou Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences, Yicheng Wang Institute of Software Chinese Academy of Sciences, Zheng Qin Institute of Software Chinese Academy of Sciences, Lei Tang Institute of Software Chinese Academy of Sciences, Yu Gao Institute of Software, Chinese Academy of Sciences, China, Dong Wang Institute of software, Chinese academy of sciences, Wei Wang , Jun Wei Institute of Software at Chinese Academy of Sciences; University of Chinese Academy of Sciences DOI | ||
09:00 20mTalk | RegMiner: Towards Constructing a Large Regression Dataset from Code Evolution History Technical Papers Xuezhi Song Fudan University, Yun Lin National University of Singapore, Siang Hwee Ng National University of Singapore, Yijian Wu Fudan University, Xin Peng Fudan University, Jin Song Dong National University of Singapore, Hong Mei Peking University DOI Pre-print | ||
09:20 20mTalk | Unicorn: Detect Runtime Error in Time-Series Databases With Hybrid Input Synthesis Technical Papers Zhiyong Wu Tsinghua University, China, Jie Liang School of Software, Tsinghua University, Mingzhe Wang Tsinghua University, Chijin Zhou Tsinghua University, Yu Jiang Tsinghua University DOI |