Subtle Bugs Everywhere: Generating Documentation for Data Wrangling Code
Data scientists reportedly spend 60 to 80 percent of their time in their daily routines on data wrangling, i.e. cleaning data and extracting features. However, data wrangling code is often repetitive and error-prone to write. Moreover, it is easy to introduce subtle bugs when reusing and adopting existing code, which result not in crashes but reduce model quality. To support data scientists with data wrangling, we present a technique to generate interactive documentation for data wrangling code. We use (1) program synthesis techniques to automatically summarize data transformations and (2) test case selection techniques to purposefully select representative examples from the data based on execution information collected with tailored dynamic program analysis. We demonstrate that a JupyterLab extension with our technique can provide documentation for many cells in popular notebooks and find in a user study that users with our plugin are faster and more effective at finding realistic bugs in data wrangling code.
Wed 17 NovDisplayed time zone: Hobart change
08:00 - 09:00 | Bugs IResearch Papers / Industry Showcase / Tool Demonstrations at Kangaroo Chair(s): Elena Sherman Boise State University | ||
08:00 20mResearch paper | On the Real-World Effectiveness of Static Bug Detectors at Finding Null Pointer Exceptions Research Papers David A Tomassi University of California, Davis, Cindy Rubio-González University of California at Davis | ||
08:20 20mTalk | Subtle Bugs Everywhere: Generating Documentation for Data Wrangling Code Research Papers Chenyang Yang Peking University, Shurui Zhou University of Toronto, Jin L.C. Guo McGill University, Christian Kästner Carnegie Mellon University | ||
08:40 10mTalk | Reducing Time-To-Fix For Fuzzer Bugs Industry Showcase Rui Abreu Faculty of Engineering, University of Porto, Portugal, Franjo Ivančić Google, Filip Niksic Google, Hadi Ravanbakhsh Google, Ramesh Viswanathan Google | ||
08:50 5mTalk | Shaker: a Tool for Detecting More Flaky Tests Faster Tool Demonstrations Marcello Cordeiro Federal University of Pernambuco, Denini Silva Federal University of Pernambuco, Leopoldo Teixeira Federal University of Pernambuco, Breno Miranda Federal University of Pernambuco, Marcelo d'Amorim Federal University of Pernambuco Link to publication |