Bridging HCI and AI Research for the Evaluation of Conversational SE Assistants
As Large Language Models (LLMs) are increasingly adopted in software engineering, recently in the form of conversational assistants, ensuring these technologies align with developers’ needs is essential. The limitations of traditional human-centered methods for evaluating LLM-based tools at scale raise the need for automatic evaluation. In this paper, we advocate combining insights from human-computer interaction (HCI) and artificial intelligence (AI) research to enable human-centered automatic evaluation of LLM-based conversational SE assistants. We identify requirements for such evaluation and challenges down the road, working towards a framework that ensures these assistants are designed and deployed in line with user needs.
Sun 27 AprDisplayed time zone: Eastern Time (US & Canada) change
16:00 - 17:30 | Session 3: Evaluating and Improving Bot ImpactBotSE at 213 Chair(s): Ahmad Abdellatif University of Calgary Ahmad Abdellatif | ||
16:00 22mTalk | Towards a Newcomers Dataset to Assess Conversational Agent�s Efficacy in Mentoring Newcomers BotSE Misan Etchie NAU RESHAPE LAB, Hunter Beach NAU RESHAPE LAB, Katia Romero Felizardo NAU RESHAPE LAB, Igor Steinmacher NAU RESHAPE LAB | ||
16:22 22mTalk | Bot-Driven Development: From Simple Automation to Autonomous Software Development Bots BotSE Pre-print | ||
16:45 22mTalk | Bridging HCI and AI Research for the Evaluation of Conversational SE Assistants BotSE | ||
17:07 22mTalk | Reducing Alert Fatigue via AI-Assisted Negotiation: A Case for Dependabot BotSE Raula Gaikovina Kula The University of Osaka |