Automated Generation of Accessibility Test Reports from Recorded User Transcripts (ICSE 2025 - Research Track) - ICSE 2025

Sustainability badge

Sat 26 April - Sun 4 May 2025 Ottawa, Ontario, Canada

Who

Syed Fatiul Huq, Mahan Tafreshipour, Kate Kalcevich, Sam Malek

Track

ICSE 2025 Research Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Wed 30 Apr 2025 11:00 - 11:15 at 210 - AI for User Experience Chair(s): Chunyang Chen

Abstract

Testing for accessibility is a significant step when developing software, as it ensures that all users, including those with disabilities, can effectively engage with web and mobile applications. While automated tools exist to detect accessibility issues in software, none are as comprehensive and effective as the process of user testing, where testers with various disabilities evaluate the application for accessibility and usability issues. However, user testing is not popular with software developers as it requires conducting lengthy interviews with users and later parsing through large recordings to derive the issues to fix. In this paper, we explore how large language models (LLMs) like GPT 4.0, which have shown promising results in context comprehension and semantic text generation, can mitigate this issue and streamline the user testing process. Our solution, called Reca11, takes in informal transcripts of test recordings and extracts the accessibility and usability issues mentioned by the tester. Our systematic prompt engineering determines the optimal configuration of input, instruction, context and demonstrations for best results. We evaluate Reca11’s effectiveness on 36 user testing sessions across three applications. Based on the findings, we investigate the strengths and weaknesses of using LLMs in this space.

Syed Fatiul Huq

University of California, Irvine

Mahan Tafreshipour

University of California at Irvine

United States

Kate Kalcevich

Fable Tech Labs Inc.

Canada

Sam Malek

University of California at Irvine

United States

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Wed 30 Apr
Displayed time zone: Eastern Time (US & Canada) change

	11:00 - 12:30	AI for User ExperienceSE In Practice (SEIP) / Demonstrations / Journal-first Papers / Research Track at 210 Chair(s): Chunyang Chen TU Munich

	11:00 15m Talk		Automated Generation of Accessibility Test Reports from Recorded User TranscriptsAward Winner Research Track Syed Fatiul Huq University of California, Irvine, Mahan Tafreshipour University of California at Irvine, Kate Kalcevich Fable Tech Labs Inc., Sam Malek University of California at Irvine
	11:15 15m Talk		KuiTest: Leveraging Knowledge in the Wild as GUI Testing Oracle for Mobile Apps SE In Practice (SEIP) Yongxiang Hu Fudan University, Yu Zhang Meituan, Xuan Wang Fudan University, Yingjie Liu School of Computer Science, Fudan University, Shiyu Guo Meituan, Chaoyi Chen Meituan, Xin Wang Fudan University, Yangfan Zhou Fudan University
	11:30 15m Talk		GUIWatcher: Automatically Detecting GUI Lags by Analyzing Mobile Application Screencasts SE In Practice (SEIP) Wei Liu Concordia University, Montreal, Canada, Feng Lin Concordia University, Linqiang Guo Concordia University, Tse-Hsun (Peter) Chen Concordia University, Ahmed E. Hassan Queen’s University
	11:45 15m Talk		GUIDE: LLM-Driven GUI Generation Decomposition for Automated Prototyping Demonstrations Kristian Kolthoff Institute for Software and Systems Engineering, Clausthal University of Technology, Felix Kretzer human-centered systems Lab (h-lab), Karlsruhe Institute of Technology (KIT) , Christian Bartelt , Alexander Maedche Human-Centered Systems Lab, Karlsruhe Institute of Technology, Simone Paolo Ponzetto Data and Web Science Group, University of Mannheim Pre-print
	12:00 15m Talk		Agent for User: Testing Multi-User Interactive Features in TikTok SE In Practice (SEIP) Sidong Feng Monash University, Changhao Du Jilin University, huaxiao liu Jilin University, Qingnan Wang Jilin University, Zhengwei Lv ByteDance, Gang Huo ByteDance, Xu Yang ByteDance, Chunyang Chen TU Munich
	12:15 7m Talk		Bug Analysis in Jupyter Notebook Projects: An Empirical Study Journal-first Papers Taijara Santana Federal University of Bahia, Paulo Silveira Neto Federal University Rural of Pernambuco, Eduardo Santana de Almeida Federal University of Bahia, Iftekhar Ahmed University of California at Irvine

:

:

:

: