Analyzing and Detecting Toxicities in Developer Online Chatrooms: A Fine-Grained Taxonomy and Automated Detection Approach
Developer online chatrooms serve as crucial bridges for maintaining connections among developers of Open Source Software (OSS) projects. Nevertheless, not all developers within these rapidly growing user communities are patient and friendly. The uncivil and provocative from unfriendly users could significantly undermine the harmony within online developer chatrooms, leading to negative effects such as attrition and reduced activity. Moreover, the unfriendly voices also put pressure on chatroom management. To facilitate the healthy development of online chatrooms, it is imperative to conduct an analysis to understand the toxicity in developer chatrooms and further develop automated detection techniques. In this paper, we collect chat messages from three representative active chatrooms on Gitter. We conduct an in-depth analysis of these messages at the level of discussion threads, examining their intent and sentiments. Employing a card-sorting method, we further construct a fine-grained taxonomy comprising seven toxicity categories and manually annotate a dataset consisting of 5,158 threads. These help us better understand the nature of toxicity in developer chatrooms and the shortcomings of existing methods. Furthermore, we propose an automated binary toxicity detection method integrating textual features, non-textual features, and negative sentiment features obtained from a Large Language Model (LLM), which can determine whether a thread is toxic or not. Experimental results demonstrate that our approach achieves an average F1-Score of 0.546, achieving a 57.8% improvement over the best-performing baseline. Additionally, we validate the effectiveness of incorporating non-textual features and negative sentiment features derived from LLM.
Wed 4 DecDisplayed time zone: Beijing, Chongqing, Hong Kong, Urumqi change
14:00 - 15:30 | Session (3)Technical Track at Room 3 (Xiangquan Ballroom) Chair(s): Ian Gorton Northeastern University – Seattle, USA | ||
14:00 30mTalk | Integrating Feedback From Application Reviews Into Software Development for Improved User Satisfaction Technical Track Omar Adbealziz University of Saskatchewan, Zadia Codabux University of Saskatchewan, Kevin Schneider University of Saskatchewan | ||
14:30 30mTalk | Analyzing and Detecting Toxicities in Developer Online Chatrooms: A Fine-Grained Taxonomy and Automated Detection Approach Technical Track Junyi Tian Zhejiang University, Lingfeng Bao Zhejiang University, Shengyi Pan , Xing Hu Zhejiang University | ||
15:00 30mTalk | Adversarial Classification Rumor Detection based on Social Communication Networks and Time Series Features Technical Track Xinyu Zhang Sun Yat-sen University, Zixin Chang Chongqing University, Junhao Wen Chongqing University, Wei Zhou Chongqing University, Li Li Beihang University |