Blogs (1) >>
VL/HCC 2020
Tue 11 - Fri 14 August 2020 Dunedin, New Zealand
Thu 13 Aug 2020 08:08 - 08:15 at Zoom Room - Data Science Chair(s): Advait Sarkar

Data scientists who work in fields like statistical analysis and machine learning often use scripting languages like R, Python, or MATLAB, and employ an exploratory programming workflow. Current IDEs offer them three programming modalities: script files, computational notebooks, and consoles. To understand how these modalities impact work practice, we conducted a study with 21 data scientists, and a subsequent larger survey with 62 respondents. Through interviews, walkthroughs, screen recordings, and participants’ analysis files, we collected information about their workflows. Our analysis shows a tension between scripts and computational notebooks. Scripts are more common, better support storage and execution of previous analyses, but hamper experimentation. Notebooks better suit the actual data science workflow, but can become easily unorganized. This dual nature of modality usage leads to several issues that affect data scientists’ workflows. We discuss these and other findings, and provide design recommendations for future data science programming IDEs.

Casual Notebooks and Rigid Scripts: Understanding Data Science Programming -- Slides (Notebooks and Scripts.pdf)17.66MiB

Thu 13 Aug
Times are displayed in time zone: Pacific Time (US & Canada) change

08:00 - 08:45: Data ScienceResearch Papers at Zoom Room
Chair(s): Advait SarkarMicrosoft Research and University of Cambridge
08:00 - 08:07
On Understanding Data ScientistsShort paper
Research Papers
Paula PereiraUniversity of Minho, Jácome CunhaHASLab/INESC TEC & University of Minho, João Paulo FernandesUniversity of Coimbra
Authorizer link Media Attached
08:08 - 08:15
Casual Notebooks and Rigid Scripts: Understanding Data Science ProgrammingShort paper
Research Papers
Krishna SubramanianRWTH Aachen University, Nur Al-Huda HamdanRWTH Aachen University, Jan BorchersRWTH Aachen University
Authorizer link File Attached
08:15 - 08:30
Code Duplication and Reuse in Jupyter NotebooksFull paper
Research Papers
Andreas KoenzenUniversity of Victoria, Neil ErnstUniversity of Victoria, Margaret-Anne StoreyUniversity of Victoria
Authorizer link Pre-print
08:30 - 08:45
The Design Space of Computational Notebooks: An Analysis of 60 Systems in Academia and IndustryFull paper
Research Papers
Sam LauUniversity of California San Diego, Ian DrososUniversity of California San Diego, Julia MarkelUniversity of California San Diego, Philip GuoUniversity of California San Diego
Authorizer link