Strategies for Reuse and Sharing among Data Scientists in Software Teams
Wed 11 May 2022 20:15 - 20:20 at ICSE room 1-even hours - Machine Learning with and for SE 7 Chair(s): Lei Ma
Wed 25 May 2022 11:20 - 11:25 at Room 301+302 - Papers 6: Machine Learning with and for SE 1 Chair(s): Baishakhi Ray
Wed 25 May 2022 13:30 - 15:00 at Ballroom Gallery - Posters 1
Effective sharing and reuse practices have long been hallmarks of proficient software engineering. Yet the exploratory nature of data science presents new challenges and opportunities to support sharing and reuse of analysis code. To better understand current practices, we conducted interviews (N=17) and a survey (N=132) with data scientists at Microsoft, and extract five commonly used strategies for sharing and reuse of past work: personal analysis reuse, personal utility libraries, team shared analysis code, team shared template notebooks, and team shared libraries. We also identify factors that encourage or discourage data scientists from sharing and reusing. Our participants described obstacles to reuse and sharing including mismatched incentives to create shared code, difficulties in making data science code modular, and a lack of tool interopoerability. We discuss how future tools might help meet these needs.
Tue 10 MayDisplayed time zone: Eastern Time (US & Canada) change
Wed 11 MayDisplayed time zone: Eastern Time (US & Canada) change
20:00 - 21:00 | Machine Learning with and for SE 7SEIP - Software Engineering in Practice / Technical Track / Journal-First Papers at ICSE room 1-even hours Chair(s): Lei Ma University of Alberta | ||
20:00 5mTalk | Journal First: On the Value of Oversampling for Deep Learning in Software Defect Prediction Journal-First Papers Media Attached | ||
20:05 5mTalk | In-IDE Code Generation from Natural Language: Promise and Challenges Journal-First Papers Frank Xu Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University, USA, Graham Neubig Carnegie Mellon University | ||
20:10 5mTalk | Dependency Tracking for Risk Mitigation in Machine Learning (ML) Systems SEIP - Software Engineering in Practice Xiwei (Sherry) Xu CSIRO Data61, Chen Wang CSIRO DATA61, Zhen Wang CSIRO Data61, Qinghua Lu CSIRO’s Data61, Liming Zhu CSIRO’s Data61; UNSW Media Attached | ||
20:15 5mTalk | Strategies for Reuse and Sharing among Data Scientists in Software Teams SEIP - Software Engineering in Practice Will Epperson Carnegie Mellon University, April Wang University of Michigan, Robert DeLine Microsoft Research, Steven M. Drucker Microsoft Research Pre-print Media Attached | ||
20:20 5mTalk | A Universal Data Augmentation Approach for Fault Localization Technical Track Huan Xie Chongqing University, Yan Lei School of Big Data & Software Engineering, Chongqing University, Meng Yan Chongqing University, Yue Yu College of Computer, National University of Defense Technology, Changsha 410073, China, Xin Xia Huawei Software Engineering Application Technology Lab, Xiaoguang Mao National University of Defense Technology DOI Pre-print Media Attached | ||
20:25 5mTalk | Explanation-Guided Fairness Testing through Genetic Algorithm Technical Track Ming Fan Xi'an Jiaotong University, Wenying Wei Xi'an Jiaotong University, Wuxia Jin Xi'an Jiaotong University, Zijiang Yang Western Michigan University, Ting Liu Xi'an Jiaotong University DOI Pre-print |