Metamorphic Testing for Audio Content Moderation Software (ASE 2025 - Research Papers)

Who

Wenxuan Wang, Yongjiang Wu, Junyuan Zhang, Shuqing Li, Yun Peng, Wenting Chen, Shuai Wang, Michael Lyu

Track

ASE 2025 Research Papers

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 17 Nov 2025 15:10 - 15:20 at Grand Hall 1 - Testing & Analysis 1

Abstract

The rapid growth of audio-centric platforms and applications such as WhatsApp and Twitter has transformed the way people communicate and share audio content in modern society. However, these platforms are increasingly misused to disseminate harmful audio content, such as hate speech, deceptive advertisements, and explicit material, which can have significant negative consequences (e.g., detrimental effects on mental health). In response, researchers and practitioners have been actively developing and deploying audio content moderation tools to tackle this issue. Despite these efforts, malicious actors can bypass moderation systems by making subtle alterations to audio content, such as modifying pitch or inserting noise. Moreover, the effectiveness of modern audio moderation tools against such adversarial inputs remains insufficiently studied. To address these challenges, we propose MTAM, a \underline{M}etamorphic \underline{T}esting framework for \underline{A}udio content \underline{M}oderation software. Specifically, we conduct a pilot study on $2000$ audio clips and define 14 metamorphic relations across two perturbation categories: Audio Features-Based and Heuristic perturbations. MTAM applies these metamorphic relations to toxic audio content to generate test cases that remain harmful while being more likely to evade detection. In our evaluation, we employ MTAM to test five commercial textual content moderation software and an academic model against three kinds of toxic content. The results show that MTAM achieves up to $38.6%$, $18.3%$, $35.1%$, $16.7%$, and $51.1%$ error finding rates (EFR) when testing commercial moderation software provided by Gladia, Assembly AI, Baidu, Nextdata, and Tencent respectively, and it obtains up to $45.7%$ EFR when testing the state-of-the-art algorithms from the academy. In addition, we leverage the test cases generated by MTAM to retrain the model we explored, which largely improves model robustness (nearly $0%$ EFR) while maintaining the accuracy on the original test set.

Wenxuan Wang

Hong Kong University of Science and Technology

Yongjiang Wu

The Chinese University of Hong Kong

Junyuan Zhang

The Chinese University of Hong Kong

Shuqing Li

The Chinese University of Hong Kong

Hong Kong SAR China

Yun Peng

The Chinese University of Hong Kong

Hong Kong SAR China

Wenting Chen

City University of Hong Kong

Shuai Wang

Hong Kong University of Science and Technology

China

Michael Lyu

The Chinese University of Hong Kong

This program is tentative and subject to change.

Time Zone

The program is currently displayed in (GMT+09:00) Seoul.

Use conference time zone: (GMT+09:00) SeoulSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 17 Nov
Displayed time zone: Seoul change

14:00 - 15:30	Testing & Analysis 1Research Papers / Journal-First Track at Grand Hall 1

14:00 10m Talk		Mokav: Execution-driven Differential Testing with LLMs Journal-First Track Khashayar Etemadi ETH Zurich, Bardia Mohammadi Sharif University of Technology, Zhendong Su ETH Zurich, Martin Monperrus KTH Royal Institute of Technology
14:10 10m Talk		Validity-Preserving Delta Debugging via Generator Trace Reduction Journal-First Track Luyao Ren Peking University, Xing Zhang Peking University, Ziyue Hua Peking University, Yanyan Jiang Nanjing University, Xiao He Bytedance, Yingfei Xiong Peking University, Tao Xie Peking University
14:20 10m Talk		Execution-Aware Program Reduction for WebAssembly via Record and Replay Research Papers Doehyun Baek University of Stuttgart, Daniel Lehmann Google, Germany, Ben L. Titzer Carnegie Mellon University, Sukyoung Ryu KAIST, Michael Pradel CISPA Helmholtz Center for Information Security
14:30 10m Talk		DebCovDiff: Differential Testing of Coverage Measurement Tools on Real-World Projects Research Papers Wentao Zhang University of Illinois Urbana-Champaign, Jinghao Jia University of Illinois Urbana-Champaign, Erkai Yu University of Illinois Urbana-Champaign, Darko Marinov University of Illinois at Urbana-Champaign, Tianyin Xu University of Illinois at Urbana-Champaign Media Attached
14:40 10m Talk		DRIFT: Debug-based Trace Inference for Firmware Testing Research Papers Changming Liu Northeastern University, Alejandro Mera Northeastern University, Meng Xu University of Waterloo, Engin Kirda Northeastern University
14:50 10m Talk		Enhancing Differential Testing With LLMs For Testing Deep Learning Libraries Journal-First Track Meiziniu LI The Hong Kong University of Science and Technology, Dongze Li The Hong Kong University of Science and Technology, Jianmeng Liu The Hong Kong University of Science and Technology, Jialun Cao Hong Kong University of Science and Technology, Yongqiang Tian Monash University, Shing-Chi Cheung Hong Kong University of Science and Technology
15:00 10m Talk		Unit Test Update through LLM-Driven Context Collection and Error-Type-Aware Refinement Research Papers Yuanhe Zhang Zhejiang University, Zhiquan Yang Zhejiang University, Shengyi Pan Zhejiang University, Zhongxin Liu Zhejiang University
15:10 10m Talk		Metamorphic Testing for Audio Content Moderation Software Research Papers Wenxuan Wang Hong Kong University of Science and Technology, Yongjiang Wu The Chinese University of Hong Kong, Junyuan Zhang The Chinese University of Hong Kong, Shuqing Li The Chinese University of Hong Kong, Yun Peng The Chinese University of Hong Kong, Wenting Chen City University of Hong Kong, Shuai Wang Hong Kong University of Science and Technology, Michael Lyu The Chinese University of Hong Kong
15:20 10m Talk		Comprehend, Imitate, and then Update: Unleashing the Power of LLMs in Test Suite Evolution Research Papers Tangzhi Xu Nanjing University, Jianhan Liu Nanjing University, Yuan Yao Nanjing University, Cong Li ETH Zurich, Feng Xu Nanjing University, Xiaoxing Ma Nanjing University