ICSE 2026
Sun 12 - Sat 18 April 2026 Rio de Janeiro, Brazil
Fri 17 Apr 2026 15:00 - 15:15 at Europa II - AI for Software Engineering 25 Chair(s): Daniel Feitosa

Computational notebooks empower data scientists to explore data, perform analytics, and share their findings. During data exploration, the scientist uses notebooks to construct and refine data pipelines that process data in multiple stages. Extracting pipelines from a given notebook is useful in understanding the notebook’s semantics and in migrating it to production systems. However, the nature of the data exploration process, and the lack of sufficient documentation in the notebook, present two challenges in extracting the pipelines. First, notebook cells can be executed in any order, making it difficult to capture the data flow between pipeline stages. Second, data transformation operations belonging to a stage may not be cleanly separated, making it difficult to extract cohesive pipeline components.

In this paper, we propose NB2P, a novel system that automatically extracts data science pipelines from notebooks. Given an input notebook, NB2P first parses it into an Abstract Syntax Tree (AST). It then performs analysis on the AST to recover the execution order, thereby addressing the first challenge above. Next, it groups the data operations into pipeline stages based on their semantics. This step is called semantic segmentation, and it addresses the second challenge using a tree-based, learned encoding-decoding algorithm that captures the data flow and fine-grained hierarchical information in the notebook. Finally, NB2P assembles the stages and constructs the final pipeline that can be deployed into production systems.

We train NB2P on a large notebook corpus from Kaggle. We compare NB2P against baselines that use state-of-the-art large language models and other machine learning models for source code segmentation. The experimental results show that NB2P consistently outperforms the baselines while incurring low overhead.

Fri 17 Apr

Displayed time zone: Brasilia, Distrito Federal, Brazil change

14:00 - 15:30
AI for Software Engineering 25Journal-first Papers / Research Track / New Ideas and Emerging Results (NIER) / Demonstrations at Europa II
Chair(s): Daniel Feitosa University of Groningen
14:00
15m
Talk
ArtifactSync: Automated Repository Synchronization through Hierarchical Change Impact Analysis
Demonstrations
Ebube Alor Concordia University, João Pedro de Souza Olivo Tardivo Universidade Estadual do Paraná, SayedHassan Khatoonabadi Concordia University, Emad Shihab Concordia University
14:15
15m
Talk
Introducing Phylogenetics in Search-based Software Engineering: Phylogenetics-aware SBSE
Journal-first Papers
Daniel Blasco SVIT Research Group. Universidad San Jorge, Antonio Iglesias Universidad San Jorge, Jorge Echeverria Universidad San Jorge, Francisca Perez Universitat Politècnica de València, Carlos Cetina
14:30
15m
Talk
Automating Terraform Code Migration through Provider Evolution KnowledgeVirtual Attendance
New Ideas and Emerging Results (NIER)
Pranjal Gupta IBM Research, Pooja Aggarwal IBM Research, Brent Paulovicks IBM Research, Prateeti Mohapatra IBM Research, Rong Lee IBM Research, Vadim Sheinin IBM Research
14:45
15m
Talk
Replacing Training with Reasoning: Reinterpreting Classic ML Pipelines with LLMs
New Ideas and Emerging Results (NIER)
Marco Alecci University of Luxembourg, Jordan Samhi University of Luxembourg, Luxembourg, Tegawendé F. Bissyandé University of Luxembourg, Jacques Klein University of Luxembourg
15:00
15m
Talk
NB2P: Generating Data Science Pipelines from Computational NotebooksVirtual Attendance
Research Track
Haotian Gao National University of Singapore, Singapore and NUSRI Chongqing, China, Quang Trung Ta National University of Singapore, Tien Tuan Anh Dinh Deakin University, Australia, Nhut Minh Ho National University of Singapore, Zhiyong Huang National University of Singapore, Beng Chin Ooi National University of Singapore, Singapore
Media Attached
15:15
15m
Talk
Multi-Location Software Model Completion
Research Track
Alisa Carla Welter Saarland University, Christof Tinnes Siemens AG, Sven Apel Saarland University