Re(gEx|DoS)Eval: Evaluating Generated Regular Expressions and their Proneness to DoS Attacks (ICSE 2024 - Artifact Evaluation)

ICSE 2024

Fri 12 - Sun 21 April 2024 Lisbon, Portugal

Attending
Sponsorship
- Sponsors and Supporters
- Sponsorships Opportunities
Program
Tracks
Organization
Search
Series
- Series
- ICSE 2026
- ICSE 2025
- ICSE 2024
- ICSE 2023
- ICSE 2022
- ICSE 2021
- ICSE 2020
- ICSE 2019
- * ICSE 2018 *

ICSE 2024 (series) / Artifact Evaluation /

Re(gEx|DoS)Eval: Evaluating Generated Regular Expressions and their Proneness to DoS Attacks

Who

Mohammed Latif Siddiq, Jiahao Zhang, Lindsay Roney, Joanna C. S. Santos

Track

ICSE 2024 Artifact Evaluation

Abstract

With the recent development of the large language model-based text and code generation technologies, users are using them for a vast range of tasks, including regex generation. Despite the efforts to generate regexes from natural language, there is no prompt benchmark for LLMs with real-world data and robust test sets. Moreover, a regex can be prone to the Denial of Service (DoS) attacks due to catastrophic backtracking. Hence, we need a systematic evaluation process to evaluate the correctness and security of the regexes generated by the language models. This artifact acompanies our ICSE-NIER paper, in which we describe Re(gEx|DoS)Eval: a framework which includes a dataset of 762 regex descriptions (prompts) from real users, refined prompts with examples, and a robust set of tests. We introduce the pass@k and vulnerable@k metrics to evaluate the generated regexes based on the functional correctness and proneness to ReDoS attacks. More- over, we demonstrate the Re(gEx|DoS)Eval with three language models i.e., T5, Phi-1.5, and GPT-3, and described the plan for the future extension of this framework.

Mohammed Latif Siddiq

University of Notre Dame

United States

Jiahao Zhang

Lindsay Roney

University of Notre Dame

Joanna C. S. Santos

University of Notre Dame

United States

xSat 12 Jul 06:29