Casual Notebooks and Rigid Scripts: Understanding Data Science Programming
Data scientists who work in fields like statistical analysis and machine learning often use scripting languages like R, Python, or MATLAB, and employ an exploratory programming workflow. Current IDEs offer them three programming modalities: script files, computational notebooks, and consoles. To understand how these modalities impact work practice, we conducted a study with 21 data scientists, and a subsequent larger survey with 62 respondents. Through interviews, walkthroughs, screen recordings, and participants’ analysis files, we collected information about their workflows. Our analysis shows a tension between scripts and computational notebooks. Scripts are more common, better support storage and execution of previous analyses, but hamper experimentation. Notebooks better suit the actual data science workflow, but can become easily unorganized. This dual nature of modality usage leads to several issues that affect data scientists’ workflows. We discuss these and other findings, and provide design recommendations for future data science programming IDEs.
|Casual Notebooks and Rigid Scripts: Understanding Data Science Programming -- Slides (Notebooks and Scripts.pdf)||17.66MiB|