Bad Snakes: Understanding and Improving Python Package Index Malware Scanning
While attackers often distribute malware to victims via open-source, community-driven package repositories, these repositories do not currently run automated malware detection systems. In this work, we explore the security goals of the repository administrators and the requirements for deployments of such malware scanners via a case study of the Python ecosystem and PyPI repository, which includes interviews with administrators and maintainers. Further, we evaluate existing malware detection techniques for deployment in this setting by creating a benchmark dataset and comparing several existing tools, including the malware checks implemented in PyPI, Bandit4Mal, and OSSGadget’s OSS Detect Backdoor.
We find that repository administrators have exacting technical demands for such malware detection tools. Specifically, they consider a false positive rate of even 0.01% to be unacceptably high, given the large number of package releases that might trigger false alerts. Measured tools have false positive rates between 15% and 97%; increasing thresholds for detection rules to reduce this rate renders the true positive rate useless. In some cases, these checks emitted alerts more often for benign packages than malicious ones. However, we also find a successful socio-technical malware detection system: external security researchers also perform repository malware scans and report the results to repository administrators. These parties face different incentives and constraints on their time and tooling. We conclude with recommendations for improving detection capabilities and strengthening the collaboration between security researchers and software repository administrators.
Wed 17 MayDisplayed time zone: Hobart change
13:45 - 15:15 | Software security and privacyTechnical Track / Journal-First Papers at Meeting Room 103 Chair(s): Wei Yang University of Texas at Dallas | ||
13:45 15mTalk | BFTDetector: Automatic Detection of Business Flow Tampering for Digital Content Service Technical Track I Luk Kim Purdue University, Weihang Wang University of Southern California, Yonghwi Kwon University of Virginia, Xiangyu Zhang Purdue University | ||
14:00 15mTalk | FedSlice: Protecting Federated Learning Models from Malicious Participants with Model Slicing Technical Track Ziqi Zhang Peking University, Yuanchun Li Institute for AI Industry Research (AIR), Tsinghua University, Bingyan Liu Peking University, Yifeng Cai Peking University, Ding Li Peking University, Yao Guo Peking University, Xiangqun Chen Peking University | ||
14:15 15mTalk | PTPDroid: Detecting Violated User Privacy Disclosures to Third-Parties of Android Apps Technical Track Zeya Tan Nanjing University of Science and Technology, Wei Song Nanjing University of Science and Technology Pre-print | ||
14:30 15mTalk | AdHere: Automated Detection and Repair of Intrusive Ads Technical Track Yutian Yan University of Southern California, Yunhui Zheng , Xinyue Liu University at Buffalo, SUNY, Nenad Medvidović University of Southern California, Weihang Wang University of Southern California | ||
14:45 15mTalk | Bad Snakes: Understanding and Improving Python Package Index Malware Scanning Technical Track | ||
15:00 7mTalk | DAISY: Dynamic-Analysis-Induced Source Discovery for Sensitive Data Journal-First Papers Xueling Zhang Rochester Institute of Technology, John Heaps University of Texas at San Antonio, Rocky Slavin The University of Texas at San Antonio, Jianwei Niu University of Texas at San Antonio, Travis Breaux Carnegie Mellon University, Xiaoyin Wang University of Texas at San Antonio | ||
15:07 7mTalk | Assessing the opportunity of combining state-of-the-art Android malware detectors Journal-First Papers Nadia Daoudi SnT, University of Luxembourg, Kevin Allix CentraleSupelec Rennes, Tegawendé F. Bissyandé SnT, University of Luxembourg, Jacques Klein University of Luxembourg |