TCSE logo 
 Sigsoft logo
Sustainability badge
Fri 2 May 2025 16:00 - 16:15 at 206 plus 208 - Human and Social for AI Chair(s): Ramiro Liscano

Hallucinations, the tendency to produce irrelevant/incorrect responses, are prevalent concerns in generative AI-based tools like ChatGPT. Although hallucinations in ChatGPT are studied for textual responses, it is unknown how ChatGPT hallucinates for technical texts that contain both textual and technical terms. We surveyed 47 software engineers and produced a benchmark of 412 Q&A pairs from the bug reports of two OSS projects. We find that a RAG-based ChatGPT (i.e., ChatGPT tuned with the benchmark issue reports) is 36.4% correct when producing answers to the questions, due to two reasons 1) limitations to understand complex technical contents in code snippets like stack traces, and 2) limitations to integrate contexts denoted in the technical terms and texts. We present CHIME (ChatGPT Inaccuracy Mitigation Engine) whose underlying principle is that if we can preprocess the technical reports better and guide the query validation process in ChatGPT, we can address the observed limitations. CHIME uses context-free grammar (CFG) to parse stack traces in technical reports. CHIME then verifies and fixes ChatGPT responses by applying metamorphic testing and query transformation. In our benchmark, CHIME shows 30.3% more correction over ChatGPT responses. In a user study, we find that the improved responses with CHIME are considered more useful than those generated from ChatGPT without CHIME

Fri 2 May

Displayed time zone: Eastern Time (US & Canada) change

16:00 - 17:30
Human and Social for AIResearch Track / SE in Society (SEIS) / SE In Practice (SEIP) at 206 plus 208
Chair(s): Ramiro Liscano Ontario Tech University
16:00
15m
Talk
ChatGPT Inaccuracy Mitigation during Technical Report Understanding: Are We There Yet?
Research Track
Salma Begum Tamanna University of Calgary, Canada, Gias Uddin York University, Canada, Song Wang York University, Lan Xia IBM, Canada, Longyu Zhang IBM, Canada
16:15
15m
Talk
Navigating the Testing of Evolving Deep Learning Systems: An Exploratory Interview Study
Research Track
Hanmo You Tianjin University, Zan Wang Tianjin University, Bin Lin Hangzhou Dianzi University, Junjie Chen Tianjin University
16:30
15m
Talk
An Empirical Study on Decision-Making Aspects in Responsible Software Engineering for AIArtifact-Available
SE In Practice (SEIP)
Lekshmi Murali Rani Chalmers University of Technology and University of Gothenburg, Sweden, Faezeh Mohammadi Chalmers University of Technology and University of Gothenburg, Sweden, Robert Feldt Chalmers | University of Gothenburg, Richard Berntsson Svensson Chalmers | University of Gothenburg
Pre-print
16:45
15m
Talk
Curious, Critical Thinker, Empathetic, and Ethically Responsible: Essential Soft Skills for Data Scientists in Software Engineering
SE in Society (SEIS)
Matheus de Morais Leça University of Calgary, Ronnie de Souza Santos University of Calgary
17:00
15m
Talk
Multi-Modal LLM-based Fully-Automated Training Dataset Generation Software Platform for Mathematics Education
SE in Society (SEIS)
Minjoo Kim Sookmyung Women's University, Tae-Hyun Kim Sookmyung Women's University, Jaehyun Chung Korea University, Hyunseok Choi Korea University, Seokhyeon Min Korea University, Joon-Ho Lim Tutorus Labs, Soohyun Park Sookmyung Women's University
17:15
15m
Talk
What Does a Software Engineer Look Like? Exploring Societal Stereotypes in LLMs
SE in Society (SEIS)
Muneera Bano CSIRO's Data61, Hashini Gunatilake Monash University, Rashina Hoda Monash University
:
:
:
: