Safe-DS: A Domain Specific Language to Make Data Science Safe
Due to the long runtime of Data Science (DS) pipelines, even small programming mistakes can be very costly, if they are not detected statically. However, even basic static type checking of DS pipelines is difficult because most are written in Python. Static typing is available in Python only via external linters. These require static type annotations for parameters or results of functions, which many DS libraries do not provide.
In this paper, we show how the wealth of Python DS libraries can be used in a statically safe way via Safe-DS, a domain specific language (DSL) for DS. Safe-DS catches conventional type errors plus errors related to range restrictions, data manipulation, and call order of functions, going well beyond the abilities of current Python linters. Python libraries are integrated into Safe-DS via a stub language for specifying the interface of its declarations, and an API-Editor that is able to extract type information from the code and documentation of Python libraries, and automatically generate suitable stubs.
Moreover, Safe-DS complements textual DS pipelines with a graphical representation that eases safe development by preventing syntax errors. The seamless synchronization of textual and graphic view lets developers always choose the one best suited for their skills and current task.
We think that Safe-DS can make DS development easier, faster, and more reliable, significantly reducing development costs.
Wed 17 MayDisplayed time zone: Hobart change
15:45 - 17:15 | Development and evolution of AI-intensive systemsSEIP - Software Engineering in Practice / Technical Track / NIER - New Ideas and Emerging Results at Meeting Room 104 Chair(s): Sebastian Elbaum University of Virginia | ||
15:45 15mTalk | Reusing Deep Neural Network Models through Model Re-engineering Technical Track Binhang Qi Beihang University, Hailong Sun Beihang University, Xiang Gao Beihang University, China, Hongyu Zhang The University of Newcastle, Zhaotian Li Beihang University, Xudong Liu Beihang University | ||
16:00 15mTalk | PyEvolve: Automating Frequent Code Changes in Python ML Systems Technical Track Malinda Dilhara University of Colorado Boulder, USA, Danny Dig JetBrains Research & University of Colorado Boulder, USA, Ameya Ketkar Uber Pre-print | ||
16:15 15mTalk | DeepArc: Modularizing Neural Networks for the Model Maintenance Technical Track xiaoning ren , Yun Lin Shanghai Jiao Tong University; National University of Singapore, Yinxing Xue University of Science and Technology of China, Ruofan Liu National University of Singapore, Jun Sun Singapore Management University, Zhiyong Feng Tianjin University, Jin Song Dong National University of Singapore | ||
16:30 15mTalk | Decomposing a Recurrent Neural Network into Modules for Enabling Reusability and Replacement Technical Track Sayem Mohammad Imtiaz Iowa State University, Fraol Batole Dept. of Computer Science, Iowa State University, Astha Singh Dept. of Computer Science, Iowa State University, Rangeet Pan IBM Research, Breno Dantas Cruz Dept. of Computer Science, Iowa State University, Hridesh Rajan Iowa State University Pre-print | ||
16:45 7mTalk | Safe-DS: A Domain Specific Language to Make Data Science Safe NIER - New Ideas and Emerging Results Pre-print | ||
16:52 7mTalk | Rapid Development of Compositional AI NIER - New Ideas and Emerging Results Lee Martie MIT-IBM Watson AI Lab, Jessie Rosenberg IBM, Veronique Demers MIT-IBM Watson AI Lab, Gaoyuan Zhang IBM, Onkar Bhardwaj MIT-IBM Watson AI Lab, John Henning IBM, Aditya Prasad IBM, Matt Stallone MIT-IBM Watson AI Lab, Ja Young Lee IBM, Lucy Yip IBM, Damilola Adesina IBM, Elahe Paikari IBM, Oscar Resendiz IBM, Sarah Shaw IBM, David Cox IBM Pre-print | ||
17:00 7mTalk | StreamAI: Challenges of Continual Learning Systems in Production for AI Industrialization SEIP - Software Engineering in Practice Mariam Barry BNP Paribas, Albert Bifet University of Waikato, Institut Polytechnique de Paris, Jean Luc Billy BNP Paribas |