Multiverse Recursive Descent Grammar Exploration
The theory behind syntactic analysis is well-established: context-free grammars describe the rules by which the sentences (programs) of a (programming) language can be generated and parsers that establish the syntactic validity of programs can be derived from grammar specifications. Syntactic analysis is also practical: fast, deterministic parsers can be derived from grammars that satisfy certain properties, parsers can give helpful errors to let programmers diagnose syntactic errors in programs, and some parsers are even capable of correcting certain errors automatically.
Problems arise, however, when the specified grammar does not satisfy desirable properties. For example, some grammars do not admit a (complete) deterministic parser (e.g., resulting in parse table conflicts), some grammars may be ambiguous (requiring disambiguation strategies), and some grammars may be left-recursive (ruling out terminating top-down parsers). The theoretical solution is simple: refactor your grammar until it satisfies the desired properties. In practice, however, and especially for large grammars, it may not be so easy to diagnose the problems to be addressed. Ambiguities in particular are easy to introduce and sources of ambiguity are notoriously difficult to identify (besides the most common and recurring cases such as operator precedence and associativity). This is especially the case when designing the syntax of languages outside of the mainstream.
The root cause of the described problems is the inherent non-determinism of nonterminals with more than one production alternative. Following this observation, and considering a grammar specification as an executable program, we propose debugging grammars using an interpreter with support for multiverse, omniscient, step-wise debugging. We define the interpreter as an instance of the multiverse debugging framework presented at SLE2023 and introduce a breakpoint language that can be used to explore the execution of (parallel) parsing threads. We demonstrate our approach using several practical scenarios, including the detection of left-recursion, non-predictive nonterminals, non-productive nonterminals, parsing and error-recovery as a special case of sentence generation and debugging generalised top-down parsers.
Thomas van Binsbergen is investigating modular techniques for the specification of the semantics and syntax of software languages and is applying these techniques for the development of meta-languages and domain-specific languages. Recent topics of interest include fundamental programming construct specification (funcons), incremental and exploratory programming environments (such as REPLs and notebooks), and domain-specific languages in the context of distributed data processing.
Van Binsbergen has developed modular techniques for describing the semantics of programming languages as part of the PLanCompS project with Peter Mosses and parser combinators for generalised top-down parsing with Adrian Johnstone and Elizabeth Scott. The results are described in his PhD thesis titled “Executable Formal Specification of Programming Languages with Reusable Components” (http://ltvanbinsbergen.nl/thesis/thesis.pdf).
Keywords: modular language definition, domain-specific languages, formal specification, modelling languages, policy-enhanced data-sharing, generalised top-down parsing, declarative programming, purely functional programming, I-MSOS, FunCons, attribute grammars, computer science education
