Understanding Regular Expression Denial of Service (ReDoS): Insights from LLM-Generated Regexes and Developer Forums (ICPC 2024 - Research Track) - ICPC 2024

Sun 14 - Sat 20 April 2024 Lisbon, Portugal

co-located with ICSE 2024

Who

Mohammed Latif Siddiq, Jiahao Zhang, Joanna C. S. Santos

Track

ICPC 2024 Research Track

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Mon 15 Apr 2024 16:20 - 16:30 at Sophia de Mello Breyner Andresen - Empirical + Human Studies Chair(s): Michalis Famelis

Abstract

Regular expression Denial of Service (ReDoS) represents an algorithmic complexity attack that exploits the processing of regular expressions (regexes) to produce a denial-of-service attack. This attack manifests when regex evaluation time scales polynomially or exponentially with input length, posing sporadic yet significant challenges for software developers. The advent of Large Language Models (LLMs) has revolutionized the generation of regexes from natural language prompts, but not without its risks. Prior works showed that LLMs can generate code with vulnerabilities and security smells. In this paper, we synthesized a vast collection of regex patterns from a comprehensive dataset, assessing their correctness and ReDoS vulnerability. We investigated the characteristics of these vulnerable regexes, categorizing them into equivalence classes to unravel their weaknesses. Our inquiry also extended to examining ReDoS patterns in actual software projects, aligning them with corresponding regex classes. LLM-generated regexes mainly have polynomial ReDoS vulnerability patterns, and it is consistent with the real-world data. Moreover, we analyzed developer dialogues on GitHub and StackOverflow, constructing a taxonomy to investigate their experiences and perspectives on ReDoS. In this study, we found that GPT-3.5 was the best LLM to generate regexes that are both correct and secure. We also found that developers’ main concern is related to mitigation strategies to remove vulnerable regexes.

Link to Preprint

https://s2e-lab.github.io/preprints/icpc_24.pdf

DOI

https://doi.org/10.1145/3643916.3644424

File attachments

Presentation Slides (ICPC'24.pptx)	5.3MiB

Mohammed Latif Siddiq

University of Notre Dame

United States

Jiahao Zhang

Joanna C. S. Santos

University of Notre Dame

United States

Presentation Slides (Online Copy)

Presentation Slides (Online Copy)

Time Zone

The program is currently displayed in (GMT+01:00) Lisbon.

Use conference time zone: (GMT+01:00) LisbonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Mon 15 Apr
Displayed time zone: Lisbon change

	16:00 - 17:30	Empirical + Human StudiesResearch Track / Early Research Achievements (ERA) / Replications and Negative Results (RENE) / at Sophia de Mello Breyner Andresen Chair(s): Michalis Famelis Université de Montréal

	16:00 10m Talk		CRSP: Emulating Human Cooperative Reasoning for Intelligible Story Point EstimationICPCICPC Full paper Research Track Rui Han , Wanjiang Han Beijing University of Posts and Telecommunications, Zhuoyan Han Beijing University of Posts and Telecommunications, Yifan Tian Beijing University of Posts and Telecommunications, Longzheng Chen Beijing University of Posts and Telecommunications, Ren Han Beijing University of Posts and Telecommunications
	16:10 10m Talk		What Do Developers Feel About Fast-Growing Programming Languages? An Exploratory StudyICPCICPC Full paper Research Track Jahnavi Kumar Indian Institute of Technology Tirupati, India, Sridhar Chimalakonda Associate Professor, Indian Institute of Technology Tirupati; Adjunct Associate Professor, University of Waterloo
	16:20 10m Talk		Understanding Regular Expression Denial of Service (ReDoS): Insights from LLM-Generated Regexes and Developer ForumsICPCICPC Full paper Research Track Mohammed Latif Siddiq University of Notre Dame, Jiahao Zhang , Joanna C. S. Santos University of Notre Dame DOI Pre-print Media Attached File Attached
	16:30 10m Talk		Exploring Social Signals in Code Review: An Eye-Tracking Study of Urgency and Reputation EffectsICPCICPC Full paper Research Track Sara Yabesi Polytechnique Montreal, Mahta Amini Polytechnique Montreal, Jelena Ristic McGill University, Zohreh Sharafi Polytechnique Montréal
	16:40 10m Talk		On the comprehensibility of functional decomposition: An empirical studyICPCICPC RENE Paper Replications and Negative Results (RENE) Ewan Tempero University of Auckland, Paul Denny The University of Auckland, James Finnie-Ansley The University of Auckland, Andrew Luxton-Reilly The University of Auckland, Diana Kirk University of Auckland, Juho Leinonen Aalto University, Asma Shakil The University of Auckland, Robert Sheehan The University of Auckland, James Tizard University of Auckland, Yu-Cheng Tu The University of Auckland, Burkhard Wünsche University of Auckland
	16:50 10m Talk		Reassessing Java Code Readability Models with a Human-Centered ApproachICPCICPC RENE Paper Replications and Negative Results (RENE) Agnia Sergeyuk JetBrains Research, Olga Lvova JetBrains, Sergey Titov JetBrains Reserach, Anastasiia Serova JetBrains, Farid Bagirov JetBrains Research, Evgeniia Kirillova JetBrains Research, Timofey Bryksin JetBrains Research
	17:00 8m Talk		Exploring the Impact of Source Code Linearity on the Programmers' Comprehension of API Code ExamplesICPCICPC ERA PaperVirtual Talk Early Research Achievements (ERA) Seham Alharbi University of York, Dimitris Kolovos University of York Pre-print
	17:08 8m Talk		Innovating Coding: Evaluating the Impact of Innovative Thinking in ProgrammingICPCICPC ERA Paper Early Research Achievements (ERA) Anthonia Njoku Polytechnique Montreal, Mahta Amini Polytechnique Montreal, Zohreh Sharafi Polytechnique Montréal
	17:16 14m Talk		Empirical + Human Studies: Panel with SpeakersICPC Discussion

:

:

:

: