A Comparison of Conversational Models and Humans in Answering Technical Questions: the Firefox Case
This program is tentative and subject to change.
The use of Large Language Models (LLMs) to support tasks in software development has steadily increased over recent years. From assisting developers in coding activities to providing conversational agents that answer newcomers’ questions. In collaboration with the Mozilla Foundation, this study evaluates the effectiveness of Retrieval-Augmented Generation (RAG) in assisting developers within the Mozilla Firefox project. We conducted an empirical analysis comparing responses from human developers, a standard GPT model, and a GPT model enhanced with RAG, using real queries from Mozilla’s developer chat rooms. To ensure a rigorous evaluation, Mozilla experts assessed the responses based on helpfulness, comprehensiveness, and conciseness. The results show that RAG-assisted responses were more comprehensive than human developers (62.50% to 54.17%) and almost as helpful (75.00% to 79.17%), suggesting RAG’s potential to enhance developer assistance. However, the RAG responses were not as concise and often verbose. The results show the potential to apply RAG-based tools to Open Source Software (OSS) to minimize the load to core maintainers without losing answer quality. Toning down retrieval mechanisms and making responses even shorter in the future would enhance developer assistance in massive projects like Mozilla Firefox.
This program is tentative and subject to change.
Fri 17 AprDisplayed time zone: Brasilia, Distrito Federal, Brazil change
11:00 - 12:30 | |||
11:00 15mTalk | Environment-Aware Code Generation: How far are We? Research Track Tongtong Wu Monash University, Rongyi Chen Southeast University, Wenjie Du Southeast University, Suyu Ma CSIRO's Data61, Guilin Qi Southeast University, Zhenchang Xing CSIRO's Data61, Shahram Khadivi eBay Inc., Ramesh Periyathambi eBay Inc., Gholamreza Haffari Monash University | ||
11:15 15mTalk | LLM-based API Argument Completion with Knowledge-Augmented Prompts Research Track Waseem Akram Beijing Institute of Technology, Yanjie Jiang Tianjin University, Haris Ali Khan Beijing Institute of Technology, Furqan Jalil Beijing Institute of Technology, Hui Liu Beijing Institute of Technology | ||
11:30 15mTalk | Distance-Guided Search in Program Synthesis with Imperfect LLM Solutions Research Track | ||
11:45 15mTalk | Automatic Dockerfile Generation with Large Language Models Research Track Jun Lyu Nanjing University, He Zhang Nanjing University, Yusong Yuan Nanjing University, Lanxin Yang Nanjing University, Yue Li Nanjing University, Manuel Rigger National University of Singapore | ||
12:00 15mTalk | A Causal Perspective on Measuring, Explaining and Mitigating Smells in LLM-Generated Code Research Track Alejandro Velasco William & Mary, Daniel Rodriguez-Cardenas William & Mary, Dipin Khati William & Mary, David N. Palacio Microsoft, Lutfar Rahman Alif University of Dhaka, Denys Poshyvanyk William & Mary DOI Pre-print | ||
12:15 15mTalk | A Comparison of Conversational Models and Humans in Answering Technical Questions: the Firefox Case Research Track João Correia PUC-Rio, Daniel Coutinho Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Marco Castelluccio Mozilla, Caio Barbosa Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Igor Steinmacher RESHAPE LAB, Northern Arizona University, USA, Marco Gerosa Northern Arizona University, Alessandro Garcia Pontifical Catholic University of Rio de Janeiro, Rafael de Mello UFRJ, Brazil, Anita Sarma Oregon State University | ||