A Comprehensive Study of Learning-based Android Malware Detectors under Challenging Environments (ICSE 2024 - Research Track)

Who

Gao Cuiying, Gaozhun Huang, Heng Li, Bang Wu, Yueming Wu, Wei Yuan

Track

ICSE 2024 Research Track

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 17 Apr 2024 14:15 - 14:30 at Luis de Freitas Branco - LLM, NN and other AI technologies 1 Chair(s): Shin Yoo

Abstract

Recent years have witnessed the proliferation of learning-based Android malware detectors. These detectors can be categorized into three types, String-based, Image-based and Graph-based. Most of them have achieved good detection performance under the ideal setting. In reality, however, detectors often face out-of-distribution samples due to the factors such as code obfuscation, concept drift (e.g., software development technique evolution and new malware families emergence), and adversarial examples (AEs). This problem has attracted increasing attention, but there is a lack of comparative studies that evaluate the existing various types of detectors under these challenging environments. In order to fill this gap, we select $12$ representative detectors from three types of detectors, and evaluate them in the challenging scenarios involving code obfuscation, concept drift and AEs, respectively. Experimental results reveal that none of the evaluated detectors can maintain their ideal-setting detection performance, and the performance of different types of detectors varies significantly under various challenging environments. We identify several factors contributing to the performance deterioration of detectors, including the limitations of feature extraction methods and learning models. We also analyze the reasons why the detectors of different types show significant performance differences when facing code obfuscation, concept drift and AEs. Finally, we provide practical suggestions from the perspectives of users and researchers, respectively. We hope our work can help understand the detectors of different types, and provide guidance for enhancing their performance and robustness.

Gao Cuiying

Huazhong University of Science and Technology

Gaozhun Huang

Huazhong University of Science and Technology

Heng Li

Huazhong University of Science and Technology

Bang Wu

Huazhong University of Science and Technology

Yueming Wu

Nanyang Technological University

Wei Yuan

Huazhong University of Science and Technology

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 17 Apr
Displayed time zone: Lisbon change

14:00 - 15:30	LLM, NN and other AI technologies 1Journal-first Papers / Research Track / New Ideas and Emerging Results at Luis de Freitas Branco Chair(s): Shin Yoo Korea Advanced Institute of Science and Technology

14:00 15m Talk		EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with Multimodal Learning Research Track Liuqing Chen Zhejiang University, Yunnong Chen Zhejiang University, Shuhong Xiao , Yaxuan Song Zhejiang University, Lingyun Sun Zhejiang University, Yankun Zhen Alibaba Group, Tingting Zhou Alibaba Group, Yanfang Chang Alibaba Group Link to publication Pre-print Media Attached File Attached
14:15 15m Talk		A Comprehensive Study of Learning-based Android Malware Detectors under Challenging Environments Research Track Gao Cuiying Huazhong University of Science and Technology, Gaozhun Huang Huazhong University of Science and Technology, Heng Li Huazhong University of Science and Technology, Bang Wu Huazhong University of Science and Technology, Yueming Wu Nanyang Technological University, Wei Yuan Huazhong University of Science and Technology
14:30 15m Talk		Toward Automatically Completing GitHub Workflows Research Track Antonio Mastropaolo Università della Svizzera italiana, Fiorella Zampetti University of Sannio, Italy, Gabriele Bavota Software Institute @ Università della Svizzera Italiana, Massimiliano Di Penta University of Sannio, Italy Pre-print
14:45 15m Talk		UniLog: Automatic Logging via LLM and In-Context Learning Research Track Junjielong Xu The Chinese University of Hong Kong, Shenzhen, Ziang Cui Southeast University, Yuan Zhao Peking University, Xu Zhang Microsoft Research, Shilin He Microsoft Research, Pinjia He Chinese University of Hong Kong, Shenzhen, Liqun Li Microsoft Research, Yu Kang Microsoft Research, Qingwei Lin Microsoft, Yingnong Dang Microsoft Azure, Saravan Rajmohan Microsoft 365, Dongmei Zhang Microsoft Research
15:00 7m Talk		Self-Supervised Learning to Prove Equivalence Between Straight-Line Programs via Rewrite Rules Journal-first Papers Steve Kommrusch Leela AI, Martin Monperrus KTH Royal Institute of Technology, Louis-Noël Pouchet Colorado State University
15:07 7m Talk		NLP-based Automated Compliance Checking of Data Processing Agreements against GDPR Journal-first Papers Orlando Amaral University of Luxembourg, Muhammad Ilyas Azeem University of Luxembourg, Sallam Abualhaija University of Luxembourg, Lionel Briand University of Ottawa, Canada; Lero centre, University of Limerick, Ireland
15:14 7m Talk		Exploring ChatGPT for Toxicity Detection in GitHub New Ideas and Emerging Results Shyamal Mishra Drexel University, Preetha Chatterjee Drexel University, USA