Exploring ChatGPT for Toxicity Detection in GitHub (ICSE 2024 - New Ideas and Emerging Results) - ICSE 2024

Fri 12 - Sun 21 April 2024 Lisbon, Portugal

Who

Shyamal Mishra, Preetha Chatterjee

Track

ICSE 2024 New Ideas and Emerging Results

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Wed 17 Apr 2024 15:14 - 15:21 at Luis de Freitas Branco - LLM, NN and other AI technologies 1 Chair(s): Shin Yoo

Abstract

Fostering a collaborative and inclusive environment is crucial for the sustained progress of open source development. However, the prevalence of negative discourse, often manifested as toxic comments, poses significant challenges to developer well-being and productivity. To identify such negativity in project communications, especially within large projects, automated toxicity detection models are necessary. To train these models effectively, we need large software engineering-specific toxicity datasets. However, such datasets are limited in availability and often exhibit imbalance (e.g., only 6 in 1000 GitHub issues are toxic)[1], posing challenges for training effective toxicity detection models. To address this problem, we explore a zero-shot LLM (ChatGPT) that is pre-trained on massive datasets but without being fine-tuned specifically for the task of detecting toxicity in software-related text. Our preliminary evaluation indicates that ChatGPT shows promise in detecting toxicity in GitHub, and warrants further investigation. We experimented with various prompts, including those designed for justifying model outputs, thereby enhancing model interpretability and paving the way for potential integration of ChatGPT-enabled toxicity detection into developer communication channels.

Shyamal Mishra

Drexel University

Preetha Chatterjee

Drexel University, USA

United States

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Wed 17 Apr
Displayed time zone: Lisbon change

	14:00 - 15:30	LLM, NN and other AI technologies 1Journal-first Papers / Research Track / New Ideas and Emerging Results at Luis de Freitas Branco Chair(s): Shin Yoo Korea Advanced Institute of Science and Technology

	14:00 15m Talk		EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with Multimodal Learning Research Track Liuqing Chen Zhejiang University, Yunnong Chen Zhejiang University, Shuhong Xiao , Yaxuan Song Zhejiang University, Lingyun Sun Zhejiang University, Yankun Zhen Alibaba Group, Tingting Zhou Alibaba Group, Yanfang Chang Alibaba Group Link to publication Pre-print Media Attached File Attached
	14:15 15m Talk		A Comprehensive Study of Learning-based Android Malware Detectors under Challenging Environments Research Track Gao Cuiying Huazhong University of Science and Technology, Gaozhun Huang Huazhong University of Science and Technology, Heng Li Huazhong University of Science and Technology, Bang Wu Huazhong University of Science and Technology, Yueming Wu Nanyang Technological University, Wei Yuan Huazhong University of Science and Technology
	14:30 15m Talk		Toward Automatically Completing GitHub Workflows Research Track Antonio Mastropaolo Università della Svizzera italiana, Fiorella Zampetti University of Sannio, Italy, Gabriele Bavota Software Institute @ Università della Svizzera Italiana, Massimiliano Di Penta University of Sannio, Italy Pre-print
	14:45 15m Talk		UniLog: Automatic Logging via LLM and In-Context Learning Research Track Junjielong Xu The Chinese University of Hong Kong, Shenzhen, Ziang Cui Southeast University, Yuan Zhao Peking University, Xu Zhang Microsoft Research, Shilin He Microsoft Research, Pinjia He Chinese University of Hong Kong, Shenzhen, Liqun Li Microsoft Research, Yu Kang Microsoft Research, Qingwei Lin Microsoft, Yingnong Dang Microsoft Azure, Saravan Rajmohan Microsoft 365, Dongmei Zhang Microsoft Research
	15:00 7m Talk		Self-Supervised Learning to Prove Equivalence Between Straight-Line Programs via Rewrite Rules Journal-first Papers Steve Kommrusch Leela AI, Martin Monperrus KTH Royal Institute of Technology, Louis-Noël Pouchet Colorado State University
	15:07 7m Talk		NLP-based Automated Compliance Checking of Data Processing Agreements against GDPR Journal-first Papers Orlando Amaral University of Luxembourg, Muhammad Ilyas Azeem University of Luxembourg, Sallam Abualhaija University of Luxembourg, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland
	15:14 7m Talk		Exploring ChatGPT for Toxicity Detection in GitHub New Ideas and Emerging Results Shyamal Mishra Drexel University, Preetha Chatterjee Drexel University, USA

:

:

:

: