This program is tentative and subject to change.
The rapid growth of audio-centric platforms and applications such as WhatsApp and Twitter has transformed the way people communicate and share audio content in modern society. However, these platforms are increasingly misused to disseminate harmful audio content, such as hate speech, deceptive advertisements, and explicit material, which can have significant negative consequences (e.g., detrimental effects on mental health). In response, researchers and practitioners have been actively developing and deploying audio content moderation tools to tackle this issue. Despite these efforts, malicious actors can bypass moderation systems by making subtle alterations to audio content, such as modifying pitch or inserting noise. Moreover, the effectiveness of modern audio moderation tools against such adversarial inputs remains insufficiently studied. To address these challenges, we propose MTAM, a \underline{M}etamorphic \underline{T}esting framework for \underline{A}udio content \underline{M}oderation software. Specifically, we conduct a pilot study on $2000$ audio clips and define 14 metamorphic relations across two perturbation categories: Audio Features-Based and Heuristic perturbations. MTAM applies these metamorphic relations to toxic audio content to generate test cases that remain harmful while being more likely to evade detection. In our evaluation, we employ MTAM to test five commercial textual content moderation software and an academic model against three kinds of toxic content. The results show that MTAM achieves up to $38.6%$, $18.3%$, $35.1%$, $16.7%$, and $51.1%$ error finding rates (EFR) when testing commercial moderation software provided by Gladia, Assembly AI, Baidu, Nextdata, and Tencent respectively, and it obtains up to $45.7%$ EFR when testing the state-of-the-art algorithms from the academy. In addition, we leverage the test cases generated by MTAM to retrain the model we explored, which largely improves model robustness (nearly $0%$ EFR) while maintaining the accuracy on the original test set.
This program is tentative and subject to change.
Mon 17 NovDisplayed time zone: Seoul change
14:00 - 15:30 | |||
14:00 10mTalk | Mokav: Execution-driven Differential Testing with LLMs Journal-First Track Khashayar Etemadi ETH Zurich, Bardia Mohammadi Sharif University of Technology, Zhendong Su ETH Zurich, Martin Monperrus KTH Royal Institute of Technology | ||
14:10 10mTalk | Validity-Preserving Delta Debugging via Generator Trace Reduction Journal-First Track Luyao Ren Peking University, Xing Zhang Peking University, Ziyue Hua Peking University, Yanyan Jiang Nanjing University, Xiao He Bytedance, Yingfei Xiong Peking University, Tao Xie Peking University | ||
14:20 10mTalk | Execution-Aware Program Reduction for WebAssembly via Record and Replay Research Papers Doehyun Baek University of Stuttgart, Daniel Lehmann Google, Germany, Ben L. Titzer Carnegie Mellon University, Sukyoung Ryu KAIST, Michael Pradel CISPA Helmholtz Center for Information Security | ||
14:30 10mTalk | DebCovDiff: Differential Testing of Coverage Measurement Tools on Real-World Projects Research Papers Wentao Zhang University of Illinois Urbana-Champaign, Jinghao Jia University of Illinois Urbana-Champaign, Erkai Yu University of Illinois Urbana-Champaign, Darko Marinov University of Illinois at Urbana-Champaign, Tianyin Xu University of Illinois at Urbana-Champaign Media Attached | ||
14:40 10mTalk | DRIFT: Debug-based Trace Inference for Firmware Testing Research Papers Changming Liu Northeastern University, Alejandro Mera Northeastern University, Meng Xu University of Waterloo, Engin Kirda Northeastern University | ||
14:50 10mTalk | Enhancing Differential Testing With LLMs For Testing Deep Learning Libraries Journal-First Track Meiziniu LI The Hong Kong University of Science and Technology, Dongze Li The Hong Kong University of Science and Technology, Jianmeng Liu The Hong Kong University of Science and Technology, Jialun Cao Hong Kong University of Science and Technology, Yongqiang Tian Monash University, Shing-Chi Cheung Hong Kong University of Science and Technology | ||
15:00 10mTalk | Unit Test Update through LLM-Driven Context Collection and Error-Type-Aware Refinement Research Papers Yuanhe Zhang Zhejiang University, Zhiquan Yang Zhejiang University, Shengyi Pan Zhejiang University, Zhongxin Liu Zhejiang University | ||
15:10 10mTalk | Metamorphic Testing for Audio Content Moderation Software Research Papers Wenxuan Wang Hong Kong University of Science and Technology, Yongjiang Wu The Chinese University of Hong Kong, Junyuan Zhang The Chinese University of Hong Kong, Shuqing Li The Chinese University of Hong Kong, Yun Peng The Chinese University of Hong Kong, Wenting Chen City University of Hong Kong, Shuai Wang Hong Kong University of Science and Technology, Michael Lyu The Chinese University of Hong Kong | ||
15:20 10mTalk | Comprehend, Imitate, and then Update: Unleashing the Power of LLMs in Test Suite Evolution Research Papers Tangzhi Xu Nanjing University, Jianhan Liu Nanjing University, Yuan Yao Nanjing University, Cong Li ETH Zurich, Feng Xu Nanjing University, Xiaoxing Ma Nanjing University | ||