ICSME 2023
Sun 1 - Fri 6 October 2023 Bogotá, Colombia

Python has become the most popular programming language as it is friendly to work with for beginners. However, a recent study has found that most security issues in Python have not been indexed by CVE and may only be fixed by “silent” security commits, which pose a threat to software security and hinder the security fixes to downstream software. It is critical to identify the hidden security commits; however, the existing datasets and methods are insufficient for security commit detection in Python, due to the limited data variety, non-comprehensive code semantics, and uninterpretable learned features. In this paper, we construct the first security commit dataset in Python, namely PySecDB, which consists of three subsets including a base dataset, a pilot dataset, and an augmented dataset. The base dataset contains the security commits associated with CVE records provided by MITRE. To increase the variety of security commits, we build the pilot dataset from GitHub by filtering keywords within the commit messages. Since not all commits provide commit messages, we further construct the augmented dataset by understanding the semantics of code changes. To build the augmented dataset, we propose a new graph representation named CommitCPG and a multi-attributed graph learning model named SCOPY to identify the security commit candidates through both sequential and structural code semantics. The evaluation shows our proposed algorithms can improve the data collection efficiency by up to 40 percentage points. After manual verification by three security experts, PySecDB consists of 1,258 security commits and 2,791 non-security commits. Furthermore, we conduct an extensive case study on PySecDB and discover four common security fix patterns that cover over 85% of security commits in Python, providing insight into secure software maintenance, vulnerability detection, and automated program repair.

Thu 5 Oct

Displayed time zone: Bogota, Lima, Quito, Rio Branco change

13:30 - 15:00
Security and Program RepairResearch Track / Industry Track at Session 1 Room - RGD 004
Chair(s): Quentin Stiévenart Université du Québec à Montréal (UQAM), Ashkan Sami Edinburgh Napier University
13:30
16m
Talk
Enhancing Code Language Models for Program Repair by Curricular Fine-tuning Framework
Research Track
Sichong Hao Faculty of Computing, Harbin Institute of Technology, Xianjun Shi Faculty of Computing, Harbin Institute of Technology, Hongwei Liu Faculty of Computing, Harbin Institute of Technology, Yanjun Shu Faculty of Computing, Harbin Institute of Technology
13:46
16m
Talk
ScaleFix: An Automated Repair of UI Scaling Accessibility Issues in Android Applications
Research Track
Ali S. Alotaibi University of Southern California, Paul T. Chiou University of Southern California, Fazle Mohammed Tawsif University of Southern California, William G.J. Halfond University of Southern California
14:02
16m
Talk
Finding an Optimal Set of Static Analyzers To Detect Software Vulnerabilities
Industry Track
Jiaqi He University of Alberta, Revan MacQueen University of Alberta, Natalie Bombardieri University of Alberta, Karim Ali University of Alberta, James Wright University of Alberta, Cristina Cifuentes Oracle Labs
14:18
16m
Talk
DockerCleaner: Automatic Repair of Security Smells in Dockerfiles
Research Track
Quang-Cuong Bui Hamburg University of Technology, Malte Laukötter Hamburg University of Technology, Riccardo Scandariato Hamburg University of Technology
Pre-print
14:34
16m
Talk
Exploring Security Commits in Python
Research Track
Shiyu Sun George Mason University, Shu Wang George Mason University, Xinda Wang George Mason University, Yunlong Xing George Mason University, Elisa Zhang Dougherty Valley High School, Kun Sun George Mason University
Pre-print
14:50
10m
Live Q&A
1:1 Q&A
Research Track