The success of data scientists in developing machine learning models is contingent on an iterative development process for detecting patterns in data, finding and extracting useful features, and maximizing their model’s performance. However, it is often the case that they struggle during model development and become stuck and unable to make significant progress. We collected qualitative and quantitative data from the workflow of data scientists that allow us to learn from and examine such moments of stuckness. We used this data to develop a model for predicting stuckness based on real-time indicators, such as code artifacts, and then used the model to develop an innovative algorithm that determines precisely when a potential stuckness intervention should occur: as close as possible to the beginning of actual stuckness. Our algorithm’s performance indicates the potential efficacy of predicting data scientist stuckness algorithmically under real-world circumstances and for real-world needs.
Robert Jungnickel RWTH Aachen University - Information Management in Mechanical Engineering, Aymen Gannouni RWTH Aachen University - Information Management in Mechanical Engineering, Anas Abdelrazeq RWTH Aachen University - Information Management in Mechanical Engineering, Ingrid Isenhardt RWTH Aachen University - Information Management in Mechanical Engineering