The Unsolvable Problem or the Unheard Answer? A Dataset of 24,669 Open-Source Software Conference Talks
Talks at practitioner-focused open-source software conferences are a valuable source of information for software engineering researchers. They provide a pulse of the community and are valuable source material for grey literature analysis. We curated a dataset of 24,669 talks from 87 open-source conferences between 2010 and 2021. We stored all relevant metadata from these conferences and provide scripts to collect the transcripts. We believe this data is useful for answering many kinds of questions, such as: What are the important/highly discussed topics within practitioner communities? How do practitioners interact? And how do they present themselves to the public? We demonstrate the usefulness of this data by reporting our findings from two small studies: a topic model analysis providing an overview of open-source community dynamics since 2011 and a qualitative analysis of a smaller community-oriented sample within our dataset to gain a better understanding of why contributors leave open-source.
Wed 18 MayDisplayed time zone: Eastern Time (US & Canada) change
21:00 - 21:50 | Session 7: Developer Wellbeing & Project CommunicationTechnical Papers / Data and Tool Showcase Track / Industry Track at MSR Main room - odd hours Chair(s): Bram Adams Queen's University, Kingston, Ontario | ||
21:00 7mTalk | On the Violation of Honesty in Mobile Apps: Automated Detection and CategoriesDistinguished Paper Award Technical Papers Humphrey Obie Monash University, Idowu Oselumhe Ilekura Data Science Nigeria, Hung Du Applied Artificial Intelligence Institute, Deakin University, Mojtaba Shahin RMIT University, Australia, John Grundy Monash University, Li Li Monash University, Jon Whittle CSIRO's Data61 and Monash University, Burak Turhan University of Oulu Pre-print | ||
21:07 7mTalk | How heated is it? Understanding GitHub locked issues Technical Papers Isabella Ferreira Polytechnique Montréal, Bram Adams Queen's University, Kingston, Ontario, Jinghui Cheng Polytechnique Montreal Pre-print Media Attached | ||
21:14 4mTalk | The OCEAN mailing list data set: Network analysis spanning mailing lists and code repositories Data and Tool Showcase Track Melanie Warrick University of Vermont, Samuel F. Rosenblatt University of Vermont, Jean-Gabriel Young University of Vermont, amanda casari Open Source Programs Office, Google, Laurent Hébert-Dufresne University of Vermont, James P. Bagrow University of Vermont DOI Pre-print Media Attached | ||
21:18 4mTalk | The Unexplored Treasure Trove of Phabricator Code Reviews Data and Tool Showcase Track Gunnar Kudrjavets University of Groningen, Nachiappan Nagappan Microsoft Research, Ayushi Rastogi University of Groningen, The Netherlands DOI Pre-print | ||
21:22 4mTalk | The Unsolvable Problem or the Unheard Answer? A Dataset of 24,669 Open-Source Software Conference Talks Data and Tool Showcase Track Kimberly Truong Oregon State University, Courtney Miller Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University, USA, Christian Kästner Carnegie Mellon University DOI Pre-print | ||
21:26 4mTalk | Exploring Apache Incubator Project Trajectories with APEX Data and Tool Showcase Track Anirudh Ramchandran University of California, Davis, Likang Yin University of California, Davis, Vladimir Filkov University of California at Davis | ||
21:30 7mTalk | A Culture of Productivity: Maximizing Productivity by Maximizing Wellbeing Industry Track Brian Houck Microsoft Research | ||
21:37 13mLive Q&A | Discussions and Q&A Technical Papers |