Multiple Lexicalisation - A Java Based Study (SLE 2019)

Who

Elizabeth Scott, Adrian Johnstone

Track

SLE 2019

Time Zone

The program is currently displayed in (GMT+03:00) Beirut.

Use conference time zone: (GMT+03:00) BeirutSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 21 Oct 2019 16:00 - 16:30 at Templars - Session 4: Parsing Chair(s): Adrian Johnstone

Abstract

We consider the possibility of making the lexicalisation phase of compilation more powerful by avoiding the need for the lexer to return a single token string from the input character string. This has the potential to empower language design by softening the boundaries between lexical and phrase level specification. The large number of lexicalisations makes it impractical to parse each one individually, but it is possible to share the parsing of common subparts, reducing the number of tokens parsed from the product of the token numbers associated with the components to their sum. We report total numbers of lexicalisations of example Java strings, and the impact on these numbers of various lexical disambiguation strategies, and we introduce a new generalised parsing technique that can efficiently parse multiple lexicalisations of character string simultaneously. We then use this technique on Java, reporting on the number of lexicalisations that correspond to syntactically correct Java strings and the degree to which the standard Java lexer is safe in the sense that it does not remove all the syntactically correct lexicalisations of an input character string. Our multi-lexer parser is an alternative to scannerless parsing of a character level grammar, retaining the separation between grammar terminals and the corresponding lexical tokens. This has the advantages of allowing the parser to use terminal level lookahead and keeping lexical level disambiguation separate from the context free grammar.

Elizabeth Scott

Royal Holloway University of London

Adrian Johnstone

Royal Holloway, University of London