Exploring ChatGPT for Toxicity Detection in GitHub
Fostering a collaborative and inclusive environment is crucial for the sustained progress of open source development. However, the prevalence of negative discourse, often manifested as toxic comments, poses significant challenges to developer well-being and productivity. To identify such negativity in project communications, especially within large projects, automated toxicity detection models are necessary. To train these models effectively, we need large software engineering-specific toxicity datasets. However, such datasets are limited in availability and often exhibit imbalance (e.g., only 6 in 1000 GitHub issues are toxic)[1], posing challenges for training effective toxicity detection models. To address this problem, we explore a zero-shot LLM (ChatGPT) that is pre-trained on massive datasets but without being fine-tuned specifically for the task of detecting toxicity in software-related text. Our preliminary evaluation indicates that ChatGPT shows promise in detecting toxicity in GitHub, and warrants further investigation. We experimented with various prompts, including those designed for justifying model outputs, thereby enhancing model interpretability and paving the way for potential integration of ChatGPT-enabled toxicity detection into developer communication channels.
Wed 17 AprDisplayed time zone: Lisbon change
14:00 - 15:30 | LLM, NN and other AI technologies 1Journal-first Papers / Research Track / New Ideas and Emerging Results at Luis de Freitas Branco Chair(s): Shin Yoo Korea Advanced Institute of Science and Technology | ||
14:00 15mTalk | EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with Multimodal Learning Research Track Liuqing Chen Zhejiang University, Yunnong Chen Zhejiang University, Shuhong Xiao , Yaxuan Song Zhejiang University, Lingyun Sun Zhejiang University, Yankun Zhen Alibaba Group, Tingting Zhou Alibaba Group, Yanfang Chang Alibaba Group Link to publication Pre-print Media Attached File Attached | ||
14:15 15mTalk | A Comprehensive Study of Learning-based Android Malware Detectors under Challenging Environments Research Track Gao Cuiying Huazhong University of Science and Technology, Gaozhun Huang Huazhong University of Science and Technology, Heng Li Huazhong University of Science and Technology, Bang Wu Huazhong University of Science and Technology, Yueming Wu Nanyang Technological University, Wei Yuan Huazhong University of Science and Technology | ||
14:30 15mTalk | Toward Automatically Completing GitHub Workflows Research Track Antonio Mastropaolo Università della Svizzera italiana, Fiorella Zampetti University of Sannio, Italy, Gabriele Bavota Software Institute @ Università della Svizzera Italiana, Massimiliano Di Penta University of Sannio, Italy Pre-print | ||
14:45 15mTalk | UniLog: Automatic Logging via LLM and In-Context Learning Research Track Junjielong Xu The Chinese University of Hong Kong, Shenzhen, Ziang Cui Southeast University, Yuan Zhao Peking University, Xu Zhang Microsoft Research, Shilin He Microsoft Research, Pinjia He Chinese University of Hong Kong, Shenzhen, Liqun Li Microsoft Research, Yu Kang Microsoft Research, Qingwei Lin Microsoft, Yingnong Dang Microsoft Azure, Saravan Rajmohan Microsoft 365, Dongmei Zhang Microsoft Research | ||
15:00 7mTalk | Self-Supervised Learning to Prove Equivalence Between Straight-Line Programs via Rewrite Rules Journal-first Papers Steve Kommrusch Leela AI, Martin Monperrus KTH Royal Institute of Technology, Louis-Noël Pouchet Colorado State University | ||
15:07 7mTalk | NLP-based Automated Compliance Checking of Data Processing Agreements against GDPR Journal-first Papers Orlando Amaral University of Luxembourg, Muhammad Ilyas Azeem University of Luxembourg, Sallam Abualhaija University of Luxembourg, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland | ||
15:14 7mTalk | Exploring ChatGPT for Toxicity Detection in GitHub New Ideas and Emerging Results |