Replies: 1 comment
-
Hey @Fredderic, we have a WIP guide on this, see eclipse-langium/langium-website#219. It outlines how to support keywords as identifiers. Note that all of the Langium grammar to chevrotain token computation happens in an overridable service, the TokenBuilder. You might want to take a look at that. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm trying to migrate a C-like language extension over to Langium — my hand-rolled parser is great, as long as the document syntax is corrent — not so good while you're actively writing it; it doesn't handle missing or extra tokens or any of that (though it does have proper error messages). It was also ported from another language at a time when I knew neither TypeScript, nor anything about implementing VS Code extensions — so it's in a less than ideal style. But I've run into the same problems as many other people, and can't seem to find a simple working solution anywhere (or an explanation of how they fixed it that makes sense). I was hoping once I can get the language parsing done, I can draw on the way Langium structures things as I bring over the rest of it (optimiser and debugger).
Typically in my hand-rolled parsers, I'll have a tokenizer consume anything that looks like an identifier, and then check those against a keywords list and change their type accordingly. Similarly for symbols, I consume the longest symbol I can match in the tokenizer (a regex of the symbol tokens, longest to shortest), emit them as "keyword"s also, and the parser just consumes a token stream (similarly, strings and numbers are emitted as whole tokens of their respective type), checking for simple exact string matched keyword tokens. It loses a little generality, but it seems to be extremely common for most programming grammars, and avoids a whole bunch of nasty lexing issues — and I'm wondering if Langium has a similar capability.
There's that
longest_alt
thing (demonstrated inkeywords_vs_identifiers.js
— which seems like a weird hacky way of doing the same job), but I can't figure out how to actually use it alongside a.langium
file, plus, would it even work for symbol tokens too? I also saw Chevrotain can define custom token matchers (though again, using theparser.ts
method), so is there a setting to just use one for all keywords; ie. every time Langium encounters a keyword string, it uses a specified matcher that attempts to parse an entire identifier, and then does a simple string compare of the result (more of a yacc/bison style) — would probably solve a lot of peoples issues. Also, custom matchers can add in a payload, such as the parsed value (inner-text of a string, the numeric form of a number, weirdness to work around JS not supporting -ve NaN's, etc), which I generally add into my tokenizers. Might even be worth while having a mechanism for.langium
files to specify that keywords matching a given pattern, should use a specific matcher — imported from another.ts
file. (Unless, of course, Langium just shoves everything into one big regex matcher, or something.)Alternatively, is there a hook that will let me edit the generated list of keywords at generate or import time? That would allow me to sort them myself, tack on a "end of keyword" look-ahead, do the aforementioned custom matcher and pre-processing, or whatever else I need to do to make it actually work.
Beta Was this translation helpful? Give feedback.
All reactions