The Art and Practice of Data Science Pipelines: A Comprehensive Study of Data Science Pipelines In Theory, In-The-Small, and In-The-Large
Fri 13 May 2022 11:25 - 11:30 at ICSE room 2-odd hours - Software Architecture and Design 3 Chair(s): Grace Lewis
Wed 25 May 2022 13:45 - 13:50 at Room 301+302 - Papers 9: Requirements, Design and App Analysis Chair(s): Rick Kazman
Increasingly larger number of software systems today are including data science components for descriptive, predictive, and prescriptive analytics. The collection of data science stages from acquisition, to cleaning/curation, to modeling, and so on are referred to as data science pipelines. To facilitate research and practice on data science pipelines, it is essential to understand their nature. What are the typical stages of a data science pipeline? How are they connected? Do the pipelines differ in the theoretical representations and that in the practice? Today we do not fully understand these architectural characteristics of data science pipelines. In this work, we present a three-pronged comprehensive study to answer this for the state-of-the-art, data science in-the-small, and data science in-the-large. Our study analyzes three datasets: a collection of 71 proposals for data science pipelines and related concepts in theory, a collection of over 105 implementations of curated data science pipelines from Kaggle competitions to understand data science in-the-small, and a collection of 21 mature data science projects from GitHub to understand data science in-the-large. Our study has led to three representations of data science pipelines that capture the essence of our subjects in theory, in-the-small, and in-the-large.
Wed 11 MayDisplayed time zone: Eastern Time (US & Canada) change
Fri 13 MayDisplayed time zone: Eastern Time (US & Canada) change
Wed 25 MayDisplayed time zone: Eastern Time (US & Canada) change
13:30 - 15:00 | Papers 9: Requirements, Design and App AnalysisSEIS - Software Engineering in Society / Technical Track / Journal-First Papers / NIER - New Ideas and Emerging Results at Room 301+302 Chair(s): Rick Kazman University of Hawai‘i at Mānoa | ||
13:30 5mTalk | How Templated Requirements Specifications Inhibit Creativity in Software Engineering Journal-First Papers Rahul Mohanani University of Jyväskylä, Paul Ralph Dalhousie University, Burak Turhan University of Oulu, Vladimir Mandić Faculty of Technical Sciences, University of Novi Sad Link to publication DOI Pre-print Media Attached | ||
13:35 5mTalk | How to Debug Inclusivity Bugs? A Debugging Process with Information Architecture SEIS - Software Engineering in Society Mariam Guizani Oregon State University, Igor Steinmacher Northern Arizona University, Jillian Emard Oregon State University, Abrar Fallatah Oregon State University, Margaret Burnett Oregon State University, Anita Sarma Oregon State University Pre-print Media Attached | ||
13:40 5mTalk | Towards a Reference Software Architecture for Human-AI Teaming in Smart Manufacturing NIER - New Ideas and Emerging Results Philipp Haindl Software Competence Center Hagenberg, Georg Buchgeher Software Competence Center Hagenberg, Maqbool Khan Software Competence Center Hagenberg, Bernhard Moser Software Competence Center Hagenberg Pre-print Media Attached | ||
13:45 5mTalk | The Art and Practice of Data Science Pipelines: A Comprehensive Study of Data Science Pipelines In Theory, In-The-Small, and In-The-Large Technical Track Sumon Biswas Carnegie Mellon University, Mohammad Wardat Dept. of Computer Science, Iowa State University, Hridesh Rajan Iowa State University Pre-print Media Attached | ||
13:50 5mTalk | DescribeCtx: Context-Aware Description Synthesis for Sensitive Behaviors in Mobile Apps Technical Track Shao Yang Case Western Reserve University, Yuehan Wang Nanjing University, Yuan Yao Nanjing University, Haoyu Wang Huazhong University of Science and Technology, China, Yanfang Ye Case Western Reserve University, Xusheng Xiao Case Western Reserve University DOI Pre-print Media Attached | ||
13:55 5mTalk | JuCify: A Step Towards Android Code Unification for Enhanced Static Analysis Technical Track Jordan Samhi University of Luxembourg, Jun Gao University of Luxembourg, Luxembourg, Nadia Daoudi SnT, University of Luxembourg, Pierre Graux University of Luxembourg, Henri Hoyez , Xiaoyu Sun Monash University, Kevin Allix University of Luxembourg, Tegawendé F. Bissyandé SnT, University of Luxembourg, Jacques Klein University of Luxembourg DOI Pre-print Media Attached | ||
14:00 5mTalk | Difuzer: Uncovering Suspicious Hidden Sensitive Operations in Android Apps Technical Track Jordan Samhi University of Luxembourg, Li Li Monash University, Tegawendé F. Bissyandé SnT, University of Luxembourg, Jacques Klein University of Luxembourg DOI Pre-print Media Attached | ||
14:05 5mTalk | FeatCompare: Feature Comparison for Competing Mobile Apps Leveraging User Reviews Journal-First Papers Maram Assi Queen's University, Safwat Hassan Thompson Rivers University, Yuan Tian Queens University, Kingston, Canada, Ying Zou Queen's University, Kingston, Ontario Link to publication Pre-print Media Attached |