Casual Notebooks and Rigid Scripts: Understanding Data Science Programming (VL/HCC 2020 - Research Papers)

Who

Krishna Subramanian, Nur Al-Huda Hamdan, Jan Borchers

Track

VL/HCC 2020 Research Papers

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 13 Aug 2020 08:08 - 08:15 at Zoom Room - Data Science Chair(s): Advait Sarkar

Abstract

Data scientists who work in fields like statistical analysis and machine learning often use scripting languages like R, Python, or MATLAB, and employ an exploratory programming workflow. Current IDEs offer them three programming modalities: script files, computational notebooks, and consoles. To understand how these modalities impact work practice, we conducted a study with 21 data scientists, and a subsequent larger survey with 62 respondents. Through interviews, walkthroughs, screen recordings, and participants’ analysis files, we collected information about their workflows. Our analysis shows a tension between scripts and computational notebooks. Scripts are more common, better support storage and execution of previous analyses, but hamper experimentation. Notebooks better suit the actual data science workflow, but can become easily unorganized. This dual nature of modality usage leads to several issues that affect data scientists’ workflows. We discuss these and other findings, and provide design recommendations for future data science programming IDEs.

Authorizer Link

https://ieeexplore.ieee.org/document/9127207/

File attachments

Casual Notebooks and Rigid Scripts: Understanding Data Science Programming -- Slides (Notebooks and Scripts.pdf)	17.66MiB

Krishna Subramanian

RWTH Aachen University

Germany

Nur Al-Huda Hamdan

RWTH Aachen University

Germany

Jan Borchers

RWTH Aachen University

Germany

Time Zone

The program is currently displayed in (GMT-07:00) Pacific Time (US & Canada).

Use conference time zone: (GMT-07:00) Pacific Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 13 Aug
Displayed time zone: Pacific Time (US & Canada) change

08:00 - 08:45	Data ScienceResearch Papers at Zoom Room Chair(s): Advait Sarkar Microsoft Research and University of Cambridge

08:00 7m Talk		On Understanding Data ScientistsShort paper Research Papers Paula Pereira University of Minho, Jácome Cunha HASLab/INESC TEC & University of Minho, João Paulo Fernandes University of Coimbra Authorizer link Media Attached
08:08 7m Talk		Casual Notebooks and Rigid Scripts: Understanding Data Science ProgrammingShort paper Research Papers Krishna Subramanian RWTH Aachen University, Nur Al-Huda Hamdan RWTH Aachen University, Jan Borchers RWTH Aachen University Authorizer link File Attached
08:15 15m Talk		Code Duplication and Reuse in Jupyter NotebooksFull paper Research Papers Andreas Koenzen University of Victoria, Neil Ernst University of Victoria, Margaret-Anne Storey University of Victoria Authorizer link Pre-print
08:30 15m Talk		The Design Space of Computational Notebooks: An Analysis of 60 Systems in Academia and IndustryFull paper Research Papers Sam Lau University of California San Diego, Ian Drosos University of California San Diego, Julia Markel University of California San Diego, Philip Guo University of California San Diego Authorizer link