ASE 2021
Sun 14 - Sat 20 November 2021 Australia
Wed 17 Nov 2021 08:20 - 08:40 at Kangaroo - Bugs I Chair(s): Elena Sherman

Data scientists reportedly spend 60 to 80 percent of their time in their daily routines on data wrangling, i.e. cleaning data and extracting features. However, data wrangling code is often repetitive and error-prone to write. Moreover, it is easy to introduce subtle bugs when reusing and adopting existing code, which result not in crashes but reduce model quality. To support data scientists with data wrangling, we present a technique to generate interactive documentation for data wrangling code. We use (1) program synthesis techniques to automatically summarize data transformations and (2) test case selection techniques to purposefully select representative examples from the data based on execution information collected with tailored dynamic program analysis. We demonstrate that a JupyterLab extension with our technique can provide documentation for many cells in popular notebooks and find in a user study that users with our plugin are faster and more effective at finding realistic bugs in data wrangling code.

Wed 17 Nov

08:00 - 09:00
Bugs IResearch Papers / Industry Showcase / Tool Demonstrations at Kangaroo
Chair(s): Elena Sherman Boise State University
