Exploring Security Commits in Python (ICSME 2023 - Research Track)

Who

Shiyu Sun, Shu Wang, Xinda Wang, Yunlong Xing, Elisa Zhang, Kun Sun

Track

ICSME 2023 Research Track

Time Zone

The program is currently displayed in (GMT-05:00) Bogota, Lima, Quito, Rio Branco.

Use conference time zone: (GMT-05:00) Bogota, Lima, Quito, Rio BrancoSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 5 Oct 2023 14:34 - 14:50 at Session 1 Room - RGD 004 - Security and Program Repair Chair(s): Quentin Stiévenart, Ashkan Sami

Abstract

Python has become the most popular programming language as it is friendly to work with for beginners. However, a recent study has found that most security issues in Python have not been indexed by CVE and may only be fixed by “silent” security commits, which pose a threat to software security and hinder the security fixes to downstream software. It is critical to identify the hidden security commits; however, the existing datasets and methods are insufficient for security commit detection in Python, due to the limited data variety, non-comprehensive code semantics, and uninterpretable learned features. In this paper, we construct the first security commit dataset in Python, namely PySecDB, which consists of three subsets including a base dataset, a pilot dataset, and an augmented dataset. The base dataset contains the security commits associated with CVE records provided by MITRE. To increase the variety of security commits, we build the pilot dataset from GitHub by filtering keywords within the commit messages. Since not all commits provide commit messages, we further construct the augmented dataset by understanding the semantics of code changes. To build the augmented dataset, we propose a new graph representation named CommitCPG and a multi-attributed graph learning model named SCOPY to identify the security commit candidates through both sequential and structural code semantics. The evaluation shows our proposed algorithms can improve the data collection efficiency by up to 40 percentage points. After manual verification by three security experts, PySecDB consists of 1,258 security commits and 2,791 non-security commits. Furthermore, we conduct an extensive case study on PySecDB and discover four common security fix patterns that cover over 85% of security commits in Python, providing insight into secure software maintenance, vulnerability detection, and automated program repair.

Link to Preprint

https://arxiv.org/pdf/2307.11853.pdf

Shiyu Sun

George Mason University

United States

Shu Wang

George Mason University

United States

Xinda Wang

George Mason University

United States

Yunlong Xing

George Mason University

United States

Elisa Zhang

Dougherty Valley High School

United States

Kun Sun

George Mason University

United States

Time Zone

The program is currently displayed in (GMT-05:00) Bogota, Lima, Quito, Rio Branco.

Use conference time zone: (GMT-05:00) Bogota, Lima, Quito, Rio BrancoSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 5 Oct
Displayed time zone: Bogota, Lima, Quito, Rio Branco change

13:30 - 15:00	Security and Program RepairResearch Track / Industry Track at Session 1 Room - RGD 004 Chair(s): Quentin Stiévenart Université du Québec à Montréal (UQAM), Ashkan Sami Edinburgh Napier University

13:30 16m Talk		Enhancing Code Language Models for Program Repair by Curricular Fine-tuning Framework Research Track Sichong Hao Faculty of Computing, Harbin Institute of Technology, Xianjun Shi Faculty of Computing, Harbin Institute of Technology, Hongwei Liu Faculty of Computing, Harbin Institute of Technology, Yanjun Shu Faculty of Computing, Harbin Institute of Technology
13:46 16m Talk		ScaleFix: An Automated Repair of UI Scaling Accessibility Issues in Android Applications Research Track Ali S. Alotaibi University of Southern California, Paul T. Chiou University of Southern California, Fazle Mohammed Tawsif University of Southern California, William G.J. Halfond University of Southern California
14:02 16m Talk		Finding an Optimal Set of Static Analyzers To Detect Software Vulnerabilities Industry Track Jiaqi He University of Alberta, Revan MacQueen University of Alberta, Natalie Bombardieri University of Alberta, Karim Ali University of Alberta, James Wright University of Alberta, Cristina Cifuentes Oracle Labs
14:18 16m Talk		DockerCleaner: Automatic Repair of Security Smells in Dockerfiles Research Track Quang-Cuong Bui Hamburg University of Technology, Malte Laukötter Hamburg University of Technology, Riccardo Scandariato Hamburg University of Technology Pre-print
14:34 16m Talk		Exploring Security Commits in Python Research Track Shiyu Sun George Mason University, Shu Wang George Mason University, Xinda Wang George Mason University, Yunlong Xing George Mason University, Elisa Zhang Dougherty Valley High School, Kun Sun George Mason University Pre-print
14:50 10m Live Q&A		1:1 Q&A Research Track