Reality Check: Assessing GPT-4 in Fixing Real-World Software Vulnerabilities (EASE 2024 - Research Papers)

Who

Zoltán Ságodi, Gabor Antal, Bence Bogenfürst, Martin Isztin, Peter Hegedus, Rudolf Ferenc

Track

EASE 2024 Research Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 20 Jun 2024 16:30 - 16:45 at Room Vietri - Security (2) Chair(s): Muhammad Ali Babar

Abstract

Discovering and mitigating software vulnerabilities is a challenging task. Security issues are often hidden in complex software systems and remain undetected for a long time until someone exploits them. These defects are often caused by simple, otherwise (and in other contexts) harmless code snippets, for example, an unchecked path traversal. Large Language Models (LLMs) promise to revolutionize not just human-machine interactions but various software engineering tasks as well, including the automatic repair of vulnerabilities. However, currently, it is hard to assess the performance, robustness, and reliability of these models as most of their evaluation has been done on small, synthetic examples that are far from real-world issues developers might face in their daily jobs. In our work, we systematically evaluate the automatic vulnerability fixing capabilities of GPT-4, a popular LLM, using a database of real-world Java vulnerabilities, Vul4J. We expect the model to provide fixes for vulnerable methods, which we evaluate manually and based on unit test results included in the Vul4J database. GPT-4 provided perfect fixes consistently for at least 12 out of the total 46 examined vulnerabilities, which could be applied as is. In an additional 5 cases, the provided textual instructions would help to fix the vulnerabilities in a practical scenario (despite the provided code being incorrect). Our findings, similar to others, also show that prompting has a significant effect on the results.

Zoltán Ságodi

University of Szeged

Hungary

Gabor Antal

University of Szeged

Hungary

Bence Bogenfürst

University of Szeged

Hungary

Martin Isztin

University of Szeged

Hungary

Peter Hegedus

University of Szeged

Hungary

Rudolf Ferenc

University of Szeged

Hungary

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 20 Jun
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

16:00 - 17:15	Security (2)Research Papers / Industry at Room Vietri Chair(s): Muhammad Ali Babar School of Computer Science, The University of Adelaide

16:00 15m Talk		VulDL: Tree-based and Graph-based Neural Networks for Vulnerability Detection and Localization Research Papers Jingzheng Wu Institute of Software, The Chinese Academy of Sciences, Xiang Ling Institute of Software, Chinese Academy of Sciences, Xu Duan Institute of Software, Chinese Academy of Sciences, Tianyue Luo Institute of Software, Chinese Academy of Sciences, Mutian Yang Institute of Software, Chinese Academy of Sciences
16:15 15m Talk		How the Training Procedure Impacts the Performance of Deep Learning-based Vulnerability Patching Research Papers Antonio Mastropaolo William and Mary, USA, Vittoria Nardone University of Molise, Gabriele Bavota Software Institute @ Università della Svizzera Italiana, Massimiliano Di Penta University of Sannio, Italy
16:30 15m Talk		Reality Check: Assessing GPT-4 in Fixing Real-World Software Vulnerabilities Research Papers Zoltán Ságodi University of Szeged, Gabor Antal University of Szeged, Bence Bogenfürst University of Szeged, Martin Isztin University of Szeged, Peter Hegedus University of Szeged, Rudolf Ferenc University of Szeged
16:45 15m Talk		Does trainer gender make a difference when delivering phishing training? A new experimental design to capture bias Research Papers André Palheiros Da Silva Vrije Universiteit, Winnie Bahati Mbaka Vrije Universiteit, Johann Mayer University of Twente, Jan-Willem Bullee University of Twente, Katja Tuma Vrije Universiteit Amsterdam
17:00 15m Talk		Leveraging Large Language Models for Preliminary Security Risk Analysis: A Mission-Critical Case Study Industry Matteo Esposito University of Rome Tor Vergata, Francesco Palagiano Multitel di Lerede Alessandro & C. s.a.s. DOI Pre-print