DISCO: A Dataset of Discord Chat Conversations for Software Engineering Research
Today, software developers work on complex and fast-moving projects that often require instant assistance from other domain and subject matter experts. Chat servers such as Discord facilitate live communication and collaboration among developers all over the world. With numerous topics discussed in parallel, mining and analyzing the chat data of these platforms would offer researchers and tool makers opportunities to develop software tools and services such as automated virtual assistants, chatbots, chat summarization techniques, Q&A thesaurus, and more.
In this paper, we propose a dataset called DISCO consisting of the one-year public DIScord chat COnversations of four software development communities (Python, Go, Clojure, Racket). We have collected the chat data of the channels containing general programming Q&A discussions from the four Discord servers, applied a disentanglement technique to extract conversations from the chat transcripts, and performed a manual validation of conversations on a random sample (500 conversations). Our dataset consists of 28,712 conversations, 1,508,093 messages posted by 323,562 users. As a case study on the dataset, we applied a topic modeling technique for extracting the top five general topics that are most discussed in each Discord channel.
Wed 18 MayDisplayed time zone: Eastern Time (US & Canada) change
| 14:00 - 14:50 | Session 5: Communication & Domains Data and Tool Showcase Track / Technical Papers at MSR Main room - even hours  Chair(s): Masud Rahman Dalhousie University, Mahmoud Alfadel University of Waterloo | ||
| 14:007m Talk | Painting the Landscape of Automotive Software in GitHub Technical Papers Sangeeth Kochanthara Eindhoven University of Technology, Yanja Dajsuren Eindhoven University of Technology, Loek Cleophas Eindhoven University of Technology (TU/e) and Stellenbosch University (SU), Mark van den Brand Eindhoven University of TechnologyPre-print Media Attached | ||
| 14:077m Full-paper | Mining the Usage of Reactive Programming APIs: A Study on GitHub and Stack Overflow Technical Papers Carlos Zimmerle Federal University of Pernambuco, Kiev Gama Federal University of Pernambuco, Fernando Castor Utrecht University & Federal University of Pernambuco, José Murilo Filho Federal University of PernambucoDOI Pre-print | ||
| 14:144m Talk | SoCCMiner: A Source Code-Comments and Comment-Context Miner Data and Tool Showcase Track Murali Sridharan University of Oulu, Mika Mäntylä University of Oulu, Maëlick Claes University of Oulu, Leevi Rantala University of OuluPre-print | ||
| 14:184m Talk | SLNET: A Redistributable Corpus of 3rd-party Simulink Models Data and Tool Showcase Track Sohil Lal Shrestha The University of Texas at Arlington, Shafiul Azam Chowdhury University of Texas at Arlington, Christoph Csallner University of Texas at ArlingtonDOI Pre-print Media Attached | ||
| 14:224m Talk | SOSum: A Dataset of Stack Overflow Post Summaries Data and Tool Showcase Track Bonan Kou Purdue University, Yifeng Di Purdue University, Muhao Chen University of Southern California, Tianyi Zhang Purdue University | ||
| 14:264m Talk | Inspect4py: A Knowledge Extraction Framework for Python Code Repositories Data and Tool Showcase Track | ||
| 14:304m Talk | DISCO: A Dataset of Discord Chat Conversations for Software Engineering Research Data and Tool Showcase Track Keerthana Muthu Subash Carleton University, Canada, Lakshmi Prasanna Kumar Carleton University, Canada, Sri Lakshmi Vadlamani Carleton University, Canada, Preetha Chatterjee Drexel University, USA, Olga Baysal Carleton UniversityDOI Pre-print Media Attached | ||
| 14:3416m Live Q&A | Discussions and Q&A Technical Papers | ||

