An Empirical Study on Quality Issues of eBay's Big Data SQL Analytics Platform
Fri 13 May 2022 11:05 - 11:10 at ICSE room 1-odd hours - Reliability and Safety 6 Chair(s): Pasqualina Potena
Big data SQL analytics platform has evolved as the key infrastructure for business data analysis. Compared with traditional costly commercial RDBMS, scalable solutions with open-source projects, such as SQL-on-Hadoop, are more popular and attractive to enterprises. In eBay, we build Carmel, a company-wide interactive SQL analytics platform based on Apache Spark. Carmel has been serving thousands of customers from hundreds of teams globally for more than 3 years. Meanwhile, despite the popularity of open-source based big data SQL analytic platforms, to the best of our knowledge, few empirical studies on quality issues were carried out for them. However, a deep understanding of quality issues and taking right mitigation are significant to the ease of manual maintenance efforts. To fill this gap, we conduct a comprehensive empirical study on 1,884 real-word quality issues from Carmel. We summarize the common symptoms, and identify the root causes with typical cases. Stakeholders including system developers, researchers, and platform maintainers can benefit from our findings and implications. Furthermore, we also present lessons learned from critical cases in our daily practice, as well as insights to motivate automatic tool support and future research directions.
Mon 9 MayDisplayed time zone: Eastern Time (US & Canada) change
21:00 - 22:00 | Reliability and Safety 4Technical Track / NIER - New Ideas and Emerging Results / SEIP - Software Engineering in Practice at ICSE room 2-odd hours Chair(s): Jonathan Sillito Brigham Young University | ||
21:00 5mTalk | Are We Training with The Right Data? Evaluating Collective Confidence in Training Data using Dempster Shafer Theory NIER - New Ideas and Emerging Results Pre-print Media Attached | ||
21:05 5mTalk | Automating Staged Rollout with Reinforcement Learning NIER - New Ideas and Emerging Results Shadow Pritchard University of Tulsa, Vidhyashree Nagaraju University of Tulsa, Lance Fiondella University of Massachusetts Dartmouth Pre-print File Attached | ||
21:10 5mTalk | An Empirical Study on Quality Issues of eBay's Big Data SQL Analytics Platform SEIP - Software Engineering in Practice Feng Zhu ebay.Inc, Lijie Xu Institute of Software, Chinese Academy of Sciences, Gang Ma ebay.Inc, Shuping Ji University of Toronto, Jie Wang Peking University, China / Ant Group, China / Alibaba Group, China, Gang Wang ebay.Inc, Hongyi Zhang ebay.Inc, Kun Wan ebay.Inc, Mingming Wang ebay.Inc, Xingchao Zhang ebay.Inc, Yuming Wang ebay.Inc, Jingpin Li ebay.Inc DOI Pre-print | ||
21:15 5mTalk | PerfSig: Extracting Performance Bug Signatures via Multi-modality Causal Analysis Technical Track Jingzhu He ShanghaiTech University, Yuhang Lin North Carolina State University, Xiaohui Gu North Carolina State University, Chin-Chia Michael Yeh Visa Research, Zhongfang Zhuang Visa Research DOI Pre-print Media Attached | ||
21:20 5mTalk | TOGA: A Neural Method for Test Oracle GenerationDistinguished Paper Award Technical Track Elizabeth Dinella , Gabriel Ryan Columbia University, USA, Todd Mytkowicz Microsoft Research, Shuvendu K. Lahiri Microsoft Research DOI Pre-print Media Attached | ||
21:25 5mTalk | Towards Practical Robustness Analysis for DNNs based on PAC-Model Learning Technical Track Renjue Li Institute of Software at Chinese Academy of Sciences, China, Pengfei Yang Institute of Software at Chinese Academy of Sciences, China, Cheng-Chao Huang Nanjing Institute of Software Technology, ISCAS, Youcheng Sun The University of Manchester, Bai Xue Institute of Software at Chinese Academy of Sciences, China, Lijun Zhang Institute of Software, Chinese Academy of Sciences Pre-print Media Attached |