Catch Me If You Can: Blackbox Adversarial Attacks on Automatic Speech Recognition using Frequency Masking (APSEC 2022 - Technical Track)

Who

Xiaoliang Wu, Ajitha Rajan

Track

APSEC 2022 Technical Track

Time Zone

The program is currently displayed in (GMT+09:00) Osaka, Sapporo, Tokyo.

Use conference time zone: (GMT+09:00) Osaka, Sapporo, TokyoSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 7 Dec 2022 13:00 - 13:20 at Room2 - Machine Learning 1 Chair(s): Syful Islam

Abstract

Abstract—Automatic speech recognition (ASR) models are used widely in applications for voice navigation and voice control of domestic appliances. ASRs have been misused by attackers to generate malicious outputs by attacking the deep learning component within ASRs. To assess the security and robustnesss of ASRs, we propose techniques within our framework SPAT that generate blackbox (agnostic to the DNN) adversarial attacks that are portable across ASRs. This is in contrast to existing work that focuses on whitebox attacks that are time consuming and lack portability.

Our techniques generate adversarial attacks that have no human audible difference by manipulating the input speech signal using a psychoacoustic model that maintains the audio perturbations below the thresholds of human perception. We propose a framework SPAT with three attack generation techniques based on the psychoacoustic concept and frame selection techniques to selectively target the attack. We evaluate portability and effectiveness of our techniques using three popular ASRs and two input audio datasets using the metrics - Word Error Rate (WER) of output transcription, Similarity to original audio, attack Success Rate on different ASRs and Detection score by a defense system. We found our adversarial attacks were portable across ASRs, not easily detected by a state-of-the-art defense system, and had significant difference in output transcriptions while sounding similar to original audio.

Xiaoliang Wu

University of Edinburgh

United Kingdom

Ajitha Rajan