ICSE 2026
Sun 12 - Sat 18 April 2026 Rio de Janeiro, Brazil
Fri 17 Apr 2026 14:15 - 14:30 at Oceania I - Analytics 4 Chair(s): Diomidis Spinellis

Computational notebooks have become the primary coding environment for data scientists. Despite their popularity, research on the code quality of these notebooks is still in its infancy, and the code shared in these notebooks is often of poor quality. Considering the importance of maintenance and reusability, it is crucial to pay attention to the understandability of the notebook code and identify the notebook metrics that play a significant role in its understandability. The level of code understandability is a qualitative variable closely associated with the user’s opinion about the code. Traditional approaches to measuring it either use limited questionnaires to review a few code pieces or rely on metadata such as likes and votes in software repositories. In our approach, we enhanced the measurement of the understandability level of Jupyter notebooks by leveraging user opinions related to code understandability within a software repository. As a case study, we started with 542,051 Kaggle Jupyter notebooks, compiled in a dataset named DistilKaggle, which we introduced in our previous research. To identify user comments associated with code understandability, we utilized a fine-tuned DistilBERT transformer. We established a user-opinion-based criterion for measuring code understandability by considering the number of code understandability-related comments, the upvotes on those comments, and the total views of the notebook received by the notebook. We refer to this criterion as User Opinion Code Understandability (UOCU), which has been proven to be much more effective than previous approaches. A hybrid approach combining UOCU with total upvotes further improved this criterion. Additionally, we trained machine learning models to classify notebook understandability solely based on notebook metrics. We collected 34 metrics for 132,723 final notebooks using the hybrid approach criterion. Our predictive model, built using a Random Forest classifier, achieved 89% accuracy in classifying code understandability levels in computational notebooks.

Fri 17 Apr

Displayed time zone: Brasilia, Distrito Federal, Brazil change

14:00 - 15:30
Analytics 4Research Track / Journal-first Papers at Oceania I
Chair(s): Diomidis Spinellis AUEB & TU Delft
14:00
15m
Talk
Back to the Roots: Assessing Mining Techniques for Java Vulnerability-Contributing Commits
Journal-first Papers
Torge Hinrichs Hamburg University of Technology, Emanuele Iannone Hamburg University of Technology, Tamás Aladics University of Szeged, Peter Hegedus University of Szeged, Andrea De Lucia University of Salerno, Fabio Palomba University of Salerno, Riccardo Scandariato Hamburg University of Technology
14:15
15m
Talk
Predicting the Understandability of Computational Notebooks through Code Metrics AnalysisVirtual Attendance
Journal-first Papers
Mojtaba Mostafavi Sharif University of Technology, Alireza Asadi Department of Computer Engineering of Sharif University of Technology, Arash Asgari York University, Bardia Mohammadi Sharif University of Technology, Abbas Heydarnoori Bowling Green State University
Link to publication DOI Media Attached
14:30
15m
Talk
How Configurable is the Linux Kernel? Analyzing Two Decades of Feature-Model History
Journal-first Papers
Elias Kuiter University of Magdeburg, Chico Sundermann TU Braunschweig, Thomas Thüm TU Braunschweig, Tobias Heß University of Ulm, Sebastian Krieter TU Braunschweig, Germany, Gunter Saake University of Magdeburg, Germany
Pre-print
14:45
15m
Talk
Breaking Strong Encapsulation: A Comprehensive Study of Java Module Abuse
Research Track
Yirui He University of California, Irvine, Yongbo Chen University of California, Irvine, Jessy Ayala University of California, Irvine, Yecheng Zhou University of California, Irvine, Qiran Wang University of California, Irvine, Joshua Garcia University of California, Irvine
15:00
15m
Talk
Causal or Correlational? A Cohort Study on the Effects of Code Smells on Class Change- and Fault-Proneness
Research Track
Sabato Nocera University of Salerno, Sira Vegas Universidad Politecnica de Madrid, Giuseppe Scanniello University of Salerno, Massimiliano Di Penta University of Sannio, Italy, Natalia Juristo Universidad Politecnica de Madrid
15:15
15m
Talk
Six Million (Suspected) Fake Stars on GitHub: A Growing Spiral of Popularity Contests, Spams, and Malware
Research Track
Hao He Carnegie Mellon University, Haoqin Yang Carnegie Mellon University, Philipp Burckhardt Socket, Inc, Alexandros Kapravelos NCSU, Bogdan Vasilescu Carnegie Mellon University, Christian Kästner Carnegie Mellon University