Restoring the Executability of Jupyter Notebooks by Automatic Upgrade of Deprecated APIs
Data scientists typically practice exploratory programming using computational notebooks, to comprehend new data and extract insights. To do this they iteratively refine their code, actively trying to reuse and re-purpose solutions created by other data scientists, in real-time. However, recent studies shave shown that a vast majority of publicly available notebooks can not be executed out of the box. One of the prominent reasons is the deprecation of data science APIs used in such notebooks, due to the rapid evolution of data science libraries. In this work, we propose RELANCER, an automatic technique that restores the executability of broken Jupyter Notebooks, in near real-time, by upgrading deprecated APIs. RELANCER employs an iterative runtime error-driven approach to identify and fix one API issue at a time. This is supported by a machine-learned model which uses the runtime error message to predict the kind of API repair needed - an update in API or package name, a parameter, or a parameter value. Then RELANCER creates a search space of candidate repairs by combining knowledge from API migration examples on GitHub as well as the API documentation and employs a second machine-learned model to rank this space of candidate mappings. An evaluation of RELANCER on a curated dataset of 255 un-executable Jupyter Notebooks from Kaggleshows that RELANCER can successfully restore the executability of 56% of the subjects, while baselines relying on just GitHub examples and just API documentation can only fix 37% and 36%of the subjects respectively. Further, pursuant to its real-time use case, RELANCER can restore execution to 48% of subjects, within a 5-minute time limit, while a baseline lacking its machine learning models can only fix 24%.
Thu 18 NovDisplayed time zone: Hobart change
09:00 - 10:00 | Testing IIResearch Papers at Koala Chair(s): Rui Abreu Faculty of Engineering, University of Porto, Portugal | ||
09:00 20mTalk | Nekara: Generalized Concurrency Testing Research Papers Udit Agarwal IIIT Delhi, Pantazis Deligiannis Microsoft Research, Cheng Huang Microsoft, Kumseok Jung University of British Columbia, Akash Lal Microsoft Research, Immad Naseer Microsoft, Matthew J. Parkinson Microsoft Research, UK, Arun Thangamani Microsoft Research, Jyothi Vedurada IIT Hyderabad, Yunpeng Xiao Microsoft | ||
09:20 20mTalk | QDiff: Differential Testing of Quantum Software Stacks Research Papers Jiyuan Wang University of California at Los Angeles, Qian Zhang University of California at Los Angeles, Guoqing Harry Xu University of California at Los Angeles, Miryung Kim University of California at Los Angeles, USA | ||
09:40 20mTalk | Restoring the Executability of Jupyter Notebooks by Automatic Upgrade of Deprecated APIs Research Papers Chenguang Zhu University of Texas at Austin, Ripon Saha Fujitsu Laboratories of America, Inc., Mukul Prasad Fujitsu Research of America, Sarfraz Khurshid The University of Texas at Austin |