ICSE 2026
Sun 12 - Sat 18 April 2026 Rio de Janeiro, Brazil

This program is tentative and subject to change.

Thu 16 Apr 2026 14:15 - 14:30 at Oceania II - Testing and Analysis 12 Chair(s): Sam Malek

To generate valid test inputs for a system, one needs a \emph{specification} of its input language—typically a \emph{context-free grammar} that describes input syntax. But where can one get such a grammar from? In the past years, the field of \emph{input grammar mining} has emerged, with creative approaches to extract input grammars from inputs, code, or both. But how good are these approaches? In particular; How \emph{accurate} are the grammars they mine?

In this study, we systematically \emph{evaluate} grammar miners for these questions. Notably, we find that the previous evaluations conducted by the respective authors—producing a set of inputs from a golden grammar and having them checked by the mined grammar, or vice versa—are insufficient, as they have a strong bias towards short, possibly unrealistic inputs. We therefore also measure the \emph{diversity} of the mined grammars using \emph{$k$-path coverage} with varying depths~$k$ to find how many \emph{combinations} of grammar elements are actually represented.

Ideally, a mined grammar should have perfect precision and recall regardless of the depth $k$. However, our results show that for all approaches presented so far, precision and recall can drop significantly compared to reported results when increasing~$k$ and thus checking for ``deeper'' diversity, especially for complex input languages such as Lisp, JSON, or Tiny-C. For instance, the Tiny-C grammar mined by Arvada achieves a precision of 75% when considering $k$-paths with $k = 1$ (the originally reported precision was 73%), but this drops to 46% for $k = 5$. White-box approaches based on program analysis, such as Mimid and Stalagmite, are more stable with varying depth $k$, but can be challenged by complex parsers such as mjs. Raising the bars for evaluation, our study shows that there is still room for improvement in grammar mining.

This program is tentative and subject to change.

Thu 16 Apr

Displayed time zone: Brasilia, Distrito Federal, Brazil change

14:00 - 15:30
Testing and Analysis 12Research Track at Oceania II
Chair(s): Sam Malek University of California at Irvine
14:00
15m
Talk
Generator Solving for Symbolic Execution
Research Track
Siwei Wei State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, and University of Chinese Academy of Sciences Beijing, China, Yan Cai Institute of Software at Chinese Academy of Sciences
14:15
15m
Talk
How Good are Input Grammar Miners? An Empirical Study
Research Track
Leon Bettscheider CISPA Helmholtz Center for Information Security, Andreas Zeller CISPA Helmholtz Center for Information Security
14:30
15m
Talk
LSPRAG: LSP-Guided RAG for Language-Agnostic Real-Time Unit Test Generation
Research Track
Gwihwan Go Tsinghua University, Quan Zhang East China Normal University, Chijin Zhou East China Normal University, Zhao Wei Tencent, Yu Jiang Tsinghua University
14:45
15m
Talk
Breaking Single-Tester Limits: Multi-Agent LLMs for Multi-User Feature Testing
Research Track
Sidong Feng Monash University, Changhao Du Jilin University, huaxiao liu Jilin University, Qingnan Wang Jilin University, Zhengwei Lv ByteDance, Mengfei Wang ByteDance, Chunyang Chen TU Munich
15:00
15m
Talk
Testing Deep Learning Libraries via Neurosymbolic Constraint Learning
Research Track
M M Abid Naziri North Carolina State University, Shinhae Kim Cornell University, Feiran Qin North Carolina State University, Saikat Dutta Cornell University, Marcelo d'Amorim North Carolina State University
15:15
15m
Talk
MioHint: LLM-Assisted Request Mutation for Whitebox REST API TestingVirtual Attendance
Research Track
Jia Li The Chinese University of Hong Kong, Jiacheng Shen Duke Kunshan University, Yuxin Su Sun Yat-sen University, Michael Lyu The Chinese University of Hong Kong
Hide past events