Big Data = Big Insights? Operationalizing Brooks’ Law in a Massive GitHub Data Set
Wed 11 May 2022 13:10 - 13:15 at ICSE room 3-odd hours - Software Economics Chair(s): Gregorio Robles
Massive data from software repositories and collaboration tools is widely used to study social aspects in software development. One question that has been addressed by a number of recent works is how the size and structure of a software project influence team productivity, a question famously considered in Brooks’ law. A number of recent studies using massive repository data suggest that developers in larger teams tend to be less productive compared to smaller teams. Despite using similar methods and data, other studies argue for a positive linear or super-linear relationship between team size and productivity, thus contesting the view of software economics that software projects are diseconomies of scale.
In our work, we study epistemological challenges that can explain the disagreement between recent studies of developer productivity in massive repository data. We further provide—to the best of our knowledge—the largest, curated corpus of GitHub projects tailored to investigate the influence of team size and collaboration patterns on individual and collective productivity. Our work contributes to the ongoing discussion on the choice of productivity metrics in the operationalization of hypotheses about determinants of successful software projects. It further highlights general pitfalls in the analysis of big data and shows that the use of bigger data sets does not automatically lead to more reliable insights.
Wed 11 MayDisplayed time zone: Eastern Time (US & Canada) change
05:00 - 06:00 | Human Aspects of SE 3SEIS - Software Engineering in Society / Technical Track / Journal-First Papers at ICSE room 4-odd hours Chair(s): Yvonne Dittrich IT University of Copenhagen, Denmark | ||
05:00 5mTalk | Socio-Technical Grounded Theory for Software Engineering (Journal First Presentation) Journal-First Papers Rashina Hoda Monash University Link to publication DOI Pre-print Media Attached | ||
05:05 5mTalk | How are Diverse End-user Human-centric Issues Discussed on GitHub? SEIS - Software Engineering in Society Hourieh Khalajzadeh Monash University, Australia, Mojtaba Shahin RMIT University, Australia, Humphrey Obie Monash University, John Grundy Monash University Pre-print Media Attached | ||
05:10 5mTalk | Good Fences Make Good Neighbours? On the Impact of Cultural and Geographical Dispersion on Community Smells SEIS - Software Engineering in Society Stefano Lambiase University of Salerno, Gemma Catolino Tilburg University & Jheronimus Academy of Data Science, Damian Andrew Tamburri TU/e, Alexander Serebrenik Eindhoven University of Technology, Fabio Palomba University of Salerno, Filomena Ferrucci University of Salerno Pre-print Media Attached | ||
05:15 5mTalk | Open Data Inclusion through Narrative Approaches SEIS - Software Engineering in Society | ||
05:20 5mTalk | GitHub Sponsors: Exploring a New Way to Contribute to Open Source Technical Track Naomichi Shimada Nara Institute of Science and Technology, Tao Xiao Nara Institute of Science and Technology, Hideaki Hata Shinshu University, Christoph Treude University of Melbourne, Kenichi Matsumoto Nara Institute of Science and Technology DOI Pre-print Media Attached | ||
05:25 5mTalk | Big Data = Big Insights? Operationalizing Brooks’ Law in a Massive GitHub Data Set Technical Track Christoph Gote Chair of Systems Design, ETH Zurich, Pavlin Mavrodiev Chair of Systems Design, ETH Zurich, Frank Schweitzer Chair of Systems Design, ETH Zurich, Ingo Scholtes Chair of Computer Science XV - Machine Learning for Complex Networks, Julius-Maximilians-Universität Würzburg Pre-print Media Attached |