SECOM: Towards a convention for security commit messages (MSR 2022 - Industry Track)

Who

Sofia Reis, Rui Abreu, Hakan Erdogmus, Corina S. Păsăreanu

Track

MSR 2022 Industry Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 20 May 2022 14:11 - 14:18 at MSR Main room - even hours - Session 16: Non-functional Properties (Availability, Security, Legal Aspects) Chair(s): Maxime Lamothe, Jin L.C. Guo

Abstract

Context. Detecting and especially assessing software vulnerabilities continues to be a challenge in the vulnerability prediction field mainly due to the poor quality and/or low amount of data curated [1]. Many works were conducted aiming to create datasets of security patches based on software repositories data [2,3,4,5]. However, there are still very few known gold standard datasets for comparison/evaluation of the different approaches [6]. One way to detect/assess software vulnerabilities is by extracting security-related information from commit messages. Yet, automating the detection and assessment of vulnerabilities upon security commit messages is still challenging due to the lack of structured and clear messages.

Are security-relevant commit messages informative? We conducted an empirical analysis of 2k security commit messages collected from GitHub commits included in CVE reports references; and, confirmed that 23% of the commit messages used to patch publicly known vulnerabilities are either 1) cryptic/poorly documented, or 2) do not seem security-related (unclear). Results suggest that best practices/templates are necessary to help security engineers create better security commit messages; and further technology development upon this type of repository data, i.e., commits messages.

How to write a good security commit messages? We searched for conventions or guidelines on writing security commits messages. But we only found guidelines to write better generic commit messages which do not consider crucial security-related information such as the CWE-ID, CVE-ID, impact/score of the vulnerability, and more. These bits of security-related information are essential in detecting and assessing vulnerabilities through commit messages for both humans and tools. Therefore, we created a convention for security commit messages that structure and contemplate information about the vulnerabilities.

SECOM: A convention for security commit messages. This convention was created upon well-known sources on writing better commits messages—provided at the end of our website—to facilitate its adoption. The structure and set of fields included in the convention were inferred 1) from the conclusions retained from our empirical analysis of security-related commits messages; and, 2) from feedback collected by presenting SECOM in two Open Source Security Foundation working groups. The full convention, details, and examples are available here: https://tqrg.github.io/secom/.

Feedback and Future Ideas: In general, the community sees value in SECOM and would like to see it as a standard practice. We are currently working with the Open Source Vulnerability Database Google team to gather internal feedback from their teams. Writing more structured and informative commit messages for vulnerability disclosure/patching will further the detection and assessment of security vulnerabilities through commit messages. In the future, new technologies can be developed on top of SECOM to boost team productivity with tools to assess compliance or to assist developers in writing better commit messages with recommendations and auto-completion.

Link to Preprint

https://tqrg.github.io/secom/static/media/msr22.0d95603005e6c291d52e.pdf

Sofia Reis

Instituto Superior Técnico, U. Lisboa & INESC-ID

Portugal

Rui Abreu

Faculty of Engineering, University of Porto, Portugal

Portugal

Hakan Erdogmus

Carnegie Mellon University

United States

Corina S. Păsăreanu

Carnegie Mellon University

United States

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 20 May
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:00	Session 16: Non-functional Properties (Availability, Security, Legal Aspects)Industry Track / Technical Papers / Registered Reports / Data and Tool Showcase Track at MSR Main room - even hours Chair(s): Maxime Lamothe Polytechnique Montreal, Montreal, Canada, Jin L.C. Guo McGill University

14:00 7m Talk		A Deep Study of the Effects and Fixes of Server-Side Request Races in Web Applications Technical Papers Zhengyi Qiu North Carolina State University, Shudi Shao North Carolina State University, Qi Zhao North Carolina State University, Hassan Ali Khan North Carolina State University, Xinning Hui North Carolina State University, Guoliang Jin North Carolina State University Media Attached
14:07 4m Talk		A Large-scale Dataset of (Open Source) License Text VariantsData and Tool Showcase Award Data and Tool Showcase Track Stefano Zacchiroli Télécom Paris, Polytechnic Institute of Paris DOI Pre-print
14:11 7m Talk		SECOM: Towards a convention for security commit messagesFOSS Impact Paper Award Industry Track Sofia Reis Instituto Superior Técnico, U. Lisboa & INESC-ID, Rui Abreu Faculty of Engineering, University of Porto, Portugal, Hakan Erdogmus Carnegie Mellon University, Corina S. Păsăreanu Carnegie Mellon University Pre-print
14:18 7m Talk		Varangian: A Git Bot for Augmented Static Analysis Industry Track Saurabh Pujar IBM Research, Yunhui Zheng IBM Research, Luca Buratti IBM Research, Burn Lewis IBM Research, Alessandro Morari IBM Research, Jim A. Laredo IBM Research, Kevin Postlethwait Red Hat, Christoph Görn Red Hat
14:25 7m Talk		Detecting Privacy-Sensitive Code Changes with Language Modeling Industry Track Gökalp Demirci Meta Platforms, Inc., Vijayaraghavan Murali Meta Platforms, Inc., Imad Ahmad Meta Platforms, Inc., Rajeev Rao Meta Platforms, Inc., Gareth Ari Aye Meta Platforms, Inc.
14:32 4m Talk		Is GitHub's Copilot as Bad As Humans at Introducing Vulnerabilities in Code? Registered Reports Owura Asare University of Waterloo, Mei Nagappan University of Waterloo, N. Asokan University of Waterloo Pre-print
14:36 7m Talk		Finding the Fun in Fundraising: Public Issues and Pull Requests in VC-backed Open-Core Companies Industry Track Kevin Xu GitHub
14:43 17m Live Q&A		Discussions and Q&A Technical Papers

Information for Participants

Info for room MSR Main room - even hours:

Click here to go to the room on Midspace