Write a Blog >>
ICSE 2023
Sun 14 - Sat 20 May 2023 Melbourne, Australia
Fri 19 May 2023 15:00 - 15:07 at Meeting Room 104 - Software development tools Chair(s): Xing Hu

Data Scientists often use notebooks to develop Data Science (DS) pipelines, particularly since they allow to selectively execute parts of the pipeline. However, notebooks for DS have many well-known flaws. We focus on the following ones in this paper: (1) Notebooks can become littered with code cells that are not part of the main DS pipeline but exist solely to make decisions (e.g. listing the columns of a tabular dataset). (2) While users are allowed to execute cells in any order, not every ordering is correct, because a cell can depend on declarations from other cells. (3) After making changes to a cell, this cell and all cells that depend on changed declarations must be rerun. (4) Changes to external values necessitate partial re-execution of the notebook. (5) Since cells are the smallest unit of execution, code that is unaffected by changes, can inadvertently be re-executed.

To solve these issues, we propose to replace cells as the basis for the selective execution of DS pipelines. Instead, we suggest populating a context-menu for variables with actions fitting their type (like listing columns if the variable is a tabular dataset). These actions are executed based on a data-flow analysis to ensure dependencies between variables are respected and results are updated properly after changes. Our solution separates pipeline code from decision making code and automates dependency management, thus reducing clutter and the risk of making errors.

Fri 19 May

Displayed time zone: Hobart change

13:45 - 15:15
13:45
15m
Talk
Safe low-level code without overhead is practical
Technical Track
Pre-print
14:00
15m
Talk
Sibyl: Improving Software Engineering Tools with SMT SelectionDistinguished Paper Award
Technical Track
Will Leeson University of Virgina, Matthew B Dwyer University of Virginia, Antonio Filieri AWS and Imperial College London
Pre-print
14:15
15m
Talk
Make Your Tools Sparkle with Trust: The PICSE Framework for Trust in Software Tools
SEIP - Software Engineering in Practice
Brittany Johnson George Mason University, Christian Bird Microsoft Research, Denae Ford Microsoft Research, Nicole Forsgren Microsoft Research, Thomas Zimmermann Microsoft Research
Pre-print
14:30
15m
Talk
CoCoSoDa: Effective Contrastive Learning for Code Search
Technical Track
Ensheng Shi Xi'an Jiaotong University, Wenchao Gu The Chinese University of Hong Kong, Yanlin Wang School of Software Engineering, Sun Yat-sen University, Lun Du Microsoft Research Asia, Hongyu Zhang The University of Newcastle, Shi Han Microsoft Research, Dongmei Zhang Microsoft Research, Hongbin Sun Xi'an Jiaotong University
Pre-print
14:45
7m
Talk
Task Context: A Tool for Predicting Code Context Models for Software Development Tasks
DEMO - Demonstrations
Yifeng Wang Zhejiang University, Yuhang Lin Zhejiang University, Zhiyuan Wan Zhejiang University, Xiaohu Yang Zhejiang University
Pre-print Media Attached
14:52
7m
Talk
Continuously Accelerating Research
NIER - New Ideas and Emerging Results
Sergey Mechtaev University College London, Jonathan Bell Northeastern University, Christopher Steven Timperley Carnegie Mellon University, Earl T. Barr University College London, Michael Hilton Carnegie Mellon University
Pre-print
15:00
7m
Talk
An Alternative to Cells for Selective Execution of Data Science Pipelines
NIER - New Ideas and Emerging Results
Lars Reimann University of Bonn, Günter Kniesel-Wünsche University of Bonn
Pre-print
15:07
7m
Talk
pytest-inline: An Inline Testing Tool for Python
DEMO - Demonstrations
Yu Liu University of Texas at Austin, Zachary Thurston Cornell University, Alan Han Cornell University, Pengyu Nie University of Texas at Austin, Milos Gligoric University of Texas at Austin, Owolabi Legunsen Cornell University