Practical and Efficient Model Extraction of Sentiment Analysis APIs (ICSE 2023 - Technical Track)

Who

Weibin Wu, Jianping Zhang, Victor Junqiu Wei, Xixian Chen, Zibin Zheng, Irwin King, Michael Lyu

Track

ICSE 2023 Technical Track

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 17 May 2023 14:00 - 14:15 at Meeting Room 104 - AI systems engineering Chair(s): Xin Peng

Abstract

Despite their stunning performance, developing deep learning models from scratch is a formidable task. Therefore, it popularizes Machine-Learning-as-a-Service (MLaaS), where general users can access the trained models of MLaaS providers via Application Programming Interfaces (APIs) on a pay-per-query basis. Unfortunately, the success of MLaaS is under threat from model extraction attacks, where attackers intend to extract a local model of equivalent functionality to the target MLaaS model. However, existing studies on model extraction of text analytics APIs frequently assume adversaries have strong knowledge about the victim model, like its architecture and parameters, which hardly holds in practice. Besides, since the attacker’s and the victim’s training data can be considerably discrepant, it is non-trivial to perform efficient model extraction. In this paper, to advance the understanding of such attacks, we propose a framework, PEEP, for practical and efficient model extraction of sentiment analysis APIs with only query access. Specifically, PEEP features a learning-based scheme, which employs out-of-domain public corpora and a novel query strategy to construct proxy training data for model extraction. Besides, PEEP introduces a greedy search algorithm to settle an appropriate architecture for the extracted model. We conducted extensive experiments with two victim models across three datasets and two real-life commercial sentiment analysis APIs. Experimental results corroborate that PEEP can consistently outperform the state-of-the-art baselines in terms of effectiveness and efficiency.

Weibin Wu

Sun Yat-sen University

Jianping Zhang

The Chinese University of Hong Kong

Victor Junqiu Wei

The Hong Kong Polytechnic University

Xixian Chen

Tencent

Zibin Zheng

School of Software Engineering, Sun Yat-sen University

Irwin King

The Chinese University of Hong Kong

Michael Lyu

The Chinese University of Hong Kong

Time Zone

The program is currently displayed in (GMT+10:00) Hobart.

Use conference time zone: (GMT+10:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 17 May
Displayed time zone: Hobart change

13:45 - 15:15	AI systems engineeringSEIP - Software Engineering in Practice / Technical Track / NIER - New Ideas and Emerging Results / Journal-First Papers at Meeting Room 104 Chair(s): Xin Peng Fudan University

13:45 15m Talk		FedDebug: Systematic Debugging for Federated Learning Applications Technical Track Waris Gill Virginia Tech, Ali Anwar University of Minnesota, Muhammad Ali Gulzar Virginia Tech
14:00 15m Talk		Practical and Efficient Model Extraction of Sentiment Analysis APIs Technical Track Weibin Wu Sun Yat-sen University, Jianping Zhang The Chinese University of Hong Kong, Victor Junqiu Wei The Hong Kong Polytechnic University, Xixian Chen Tencent, Zibin Zheng School of Software Engineering, Sun Yat-sen University, Irwin King The Chinese University of Hong Kong, Michael Lyu The Chinese University of Hong Kong
14:15 15m Talk		CrossCodeBench: Benchmarking Cross-Task Generalization of Source Code Models Technical Track Changan Niu Software Institute, Nanjing University, Chuanyi Li Nanjing University, Vincent Ng Human Language Technology Research Institute, University of Texas at Dallas, Richardson, TX 75083-0688, Bin Luo Nanjing University Pre-print
14:30 15m Talk		Challenges in Adopting Artificial Intelligence Based User Input Verification Framework in Reporting Software Systems SEIP - Software Engineering in Practice Dong Jae Kim Concordia University, Tse-Hsun (Peter) Chen Concordia University, Steve Sporea , Andrei Toma ERA Environmental Management Solutions, Laura Weinkam , Sarah Sajedi ERA Environmental Management Solutions, Steve Sporea
14:45 7m Talk		Towards Understanding Quality Challenges of the Federated Learning for Neural Networks: A First Look from the Lens of Robustness Journal-First Papers Amin Eslami Abyane University of Calgary, Derui Zhu Technical University of Munich, Roberto Souza University of Calgary, Lei Ma University of Alberta, Hadi Hemmati York University
14:52 7m Talk		An Empirical Study of the Impact of Hyperparameter Tuning and Model Optimization on the Performance Properties of Deep Neural Networks Journal-First Papers Lizhi Liao Concordia University, Heng Li Polytechnique Montréal, Weiyi Shang University of Waterloo, Lei Ma University of Alberta
15:00 7m Talk		Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction and Clustering Journal-First Papers Mohammed Attaoui University of Luxembourg, Hazem FAHMY University of Luxembourg, Fabrizio Pastore University of Luxembourg, Lionel Briand University of Luxembourg; University of Ottawa Link to publication Pre-print
15:07 7m Talk		Iterative Assessment and Improvement of DNN Operational Accuracy NIER - New Ideas and Emerging Results Antonio Guerriero Università di Napoli Federico II, Roberto Pietrantuono Università di Napoli Federico II, Stefano Russo Università di Napoli Federico II Pre-print