Exploring the Jupyter Ecosystem: An Empirical Study of Bugs and Vulnerabilities (ESEIW 2025 - ESEM - Technical Track)

Sun 28 September - Fri 3 October 2025

Who

Wenyuan Jiang, Diany Pressato, Harsh Darji, Thibaud Lutellier

Track

ESEIW 2025 ESEM - Technical Track

Abstract

Background. Jupyter notebooks are one of the main tools used by data scientists. Notebooks include features (configuration scripts, markdown, images, etc.) that make them challenging to analyze compared to traditional software. As a result, existing software engineering models, tools, and studies do not capture the uniqueness of Notebook’s behavior.

Aims. This paper aims at providing a large-scale empirical study of bugs and vulnerability in the Notebook ecosystems.

Method. Our quantitative analysis of two sources of notebooks (GitHub and Kaggle) indicates that due to the combination of configuration scripts, Python code, documentation, and output in the same documents, Notebooks are subject to many unique types of bugs that make Notebook projects hard to maintain. In addition, we further propose a new taxonomy for bugs in Jupyter Notebooks obtained from a qualitative analysis.

Results. Our findings highlight that configuration issues are among the most common bugs in notebook documents, followed by incorrect API usage. Finally, we explore common vulnerabilities associated with popular deployment frameworks to better understand risks associated with Notebook development.

Conclusions. This work highlights that notebooks are less well-supported than traditional software, resulting in more complex code, misconfiguration, and poor maintenance.

Link to Preprint

https://arxiv.org/abs/2507.18833

Exploring the Jupyter Ecosystem: An Empirical Study of Bugs and Vulnerabilities

Wenyuan Jiang

ETH Zürich

Switzerland

Diany Pressato

Concordia University

Canada

Harsh Darji

University of Alberta

Canada

Thibaud Lutellier

University of Alberta

Canada

Tracks