Blogs (1) >>
VL/HCC 2020
Tue 11 - Fri 14 August 2020 Dunedin, New Zealand
Thu 13 Aug 2020 08:15 - 08:30 at Zoom Room - Data Science Chair(s): Advait Sarkar

Duplicating one’s own code makes it faster to write software. This expediency is particularly valuable for users of computational notebooks. Duplication allows notebook users to quickly test hypotheses and iterate over data. In this paper, we explore how much, how and from where code duplication occurs in computational notebooks, and identify potential barriers to code reuse. Previous work in the area of computational notebooks describes developers’ motivations for reuse and duplication but does not show how much reuse occurs or which barriers they face when reusing code. To address this gap, we first analyzed GitHub repositories for code duplicates contained in a repository’s Jupyter notebooks, and then conducted an observational user study of code reuse, where participants solved specific tasks using notebooks. Our findings reveal that repositories in our sample have a median self-duplication rate of 5%. However, in our user study, few participants duplicated their own code, preferring to reuse code from online sources.

Thu 13 Aug

Displayed time zone: Pacific Time (US & Canada) change

08:00 - 08:45
Data ScienceResearch Papers at Zoom Room
Chair(s): Advait Sarkar Microsoft Research and University of Cambridge
08:00
7m
Talk
On Understanding Data ScientistsShort paper
Research Papers
Paula Pereira University of Minho, Jácome Cunha HASLab/INESC TEC & University of Minho, João Paulo Fernandes University of Coimbra
Authorizer link Media Attached
08:08
7m
Talk
Casual Notebooks and Rigid Scripts: Understanding Data Science ProgrammingShort paper
Research Papers
Krishna Subramanian RWTH Aachen University, Nur Al-Huda Hamdan RWTH Aachen University, Jan Borchers RWTH Aachen University
Authorizer link File Attached
08:15
15m
Talk
Code Duplication and Reuse in Jupyter NotebooksFull paper
Research Papers
Andreas Koenzen University of Victoria, Neil Ernst University of Victoria, Margaret-Anne Storey University of Victoria
Authorizer link Pre-print
08:30
15m
Talk
The Design Space of Computational Notebooks: An Analysis of 60 Systems in Academia and IndustryFull paper
Research Papers
Sam Lau University of California San Diego, Ian Drosos University of California San Diego, Julia Markel University of California San Diego, Philip Guo University of California San Diego
Authorizer link