Example DSL #153
Replies: 16 comments 32 replies
-
A word about me:
|
Beta Was this translation helpful? Give feedback.
-
Let's go at it. From the current npm package documentation, I gather that the plan is to draft a grammar for the DSL. From that, I should have a parser, LSP, and types generated (Langium CLI). With some extra efforts, that means I should be able to write texts in my brand new language in a VS Code editor (that is the vscode language server thing). And with even more efforts, I should be able to customize the whole thing. Immediate questions:
Alright, let's draft that grammar. So:
Fair enough. I wonder how I am going to handle line returns and spaces with this language. It should be ok because clauses all fit in one line. Spaces are generally meaningless but there is some indentation to handle in the daily intake section of the language:
Le'ts keep that in some part of the mind, and move on.
What does very similar means?? What is similar, what is different? How would I even know? Ok I see that there are examples. That may help. So let's look at Xtext documentation and the examples. Xtext documentationWell.... the whole thing is dense and absconse enough that this will be left for another day. Let's move to the examples. That might work better. ExamplesI maintain a state machine library so let's look at the state machine example: What am I supposed to look at? Alright let's look at package.json: "langium": {
"languageId": "statemachine",
"grammar": "src/language-server/statemachine.langium",
"extensions": [".statemachine"],
"out": "src/language-server/generated",
"textMate": {
"out": "./syntaxes/statemachine.tmLanguage.json"
} Looks like the grammar is here: src/language-server/statemachine.langium. Let's look.
Ok, I can recognize terminal symbols, some regexps, some parsing rules, albeit with a strange syntax which is probably the Xtext part of things. First impression is that this is not very readable vs eBNF. But then again, this may just be the result of unfamiliarity. Let's try to figure out the grammar from some examples of the DSL. Where am I going to find that... The pakcage.json ( ... Well I could not find any. The next best course of action is go back to Xtext then. TO BE CONTINUED |
Beta Was this translation helpful? Give feedback.
-
This continues #153 (comment) after failing to find a Learning Xtext
👍
Excellent. That is what I am going to have to figure out.
Not really useful at this point, but why not.
Let's have a look! https://www.eclipse.org/Xtext/documentation/102_domainmodelwalkthrough.html#write-your-own-grammar Oh I don't know why I thought this was a video. Probably because of the mention of the 15mn. So this is a text tutorial whose time to completion is estimated to be 15mn. 15mn sounds appealing so let's do that. The tutorial starts with an example of the target language. Good, that's exactly what I was missing previously with the statemachine language. In summary:
I don't have eclipse on my PC. It is unclear what exactly is involved in installing eclipse, xtext (dependencies, environment, etc.), how eclipse is started, and how you set up a workspace. I now seriously doubt that this will be a 15mn tutorial anymore. I could end up down a rabbit hole. In the end I just want to understand how to write a grammar so I am going to skip this till I don't have a choice.
Ok that's useful. There is some kind of header and then what looks parsing rules. The explanation given in the tutorial is clear. I gather that:
Some thinking:
Good summary.
Let's have a look. Epackage, Ecore, etc.... We'll skip that in the end. that is however useful:
as we encountered mention of linking previously without explanation of what the term meant. The part about terminal rules is interesting and easy to follow. The part about Ok, that shed some light on the Xtext way of describing grammars and generating parsers. Let's see if that helps understand the statemachine language:
Well:
Conclusion:
Alright that'is it for today. Next step is to see how to write our diet grammar with the information that we have gathered and brace for going through the unavoidable error messages that will result as we make our way through. |
Beta Was this translation helpful? Give feedback.
-
This continues #153 (comment). Here we are going to try implementing our diet grammar. We start with installing VSCode and following the instructions documented in the npm package:
Remarks:
This is exciting, we did not have much to do to get the minimal Original grammar:
We add the
So now we have the second attempt:
OK let's see how lucky we are. We get from VS Code:
So we have to create an object it seems.
We obediently run
with the following results:
No errors!! yay. F5 to open a new VS Code window. And... No code completion... In fact, we stil have the Hello completion taking place. Ok, so closing everything and back. Same issue. So what a man to do?? Looking at the Restarting VS Code. Running F5 and no, still Hello in there. Back at our main VS Code window it seems that the ast.ts code is wrong. Indeed we have errors shown in the
and this line
Ok, let's return to the grammar. Maybe we should try again having UNIT as a terminal? Copying Ok found one mistake I made. VS Code requires you to save the file. I am using WebStorm which does that automatically... So of course the grammar wasn't updated if the file was not saved... [Edit]: there is an option in VS Code that allows to auto save file. Don't know for the life of me why this is not the default but it is easy enough to activate.
Ok new version of the grammar:
Still does not work... Ok let's make it even simpler:
Only one error this time:
(in ast.ts : That is not very encouraging. We have the most minimal grammar we can think of, no errors reported from the grammar declaration language in VS Code, and not a clue on what could be going wrong. That is about it. We are blocked and we'll call it a day. |
Beta Was this translation helpful? Give feedback.
-
This continues #153 (comment) After meeting a hurdle, we are now updating our strategy to first write the eBNF part of the grammar declaration language. The idea is to first test that we are the right language, before thinking about the AST we need to exploit the language programmatically:
(the last line is a workaround for a bug. Cf. #153 (reply in thread)). Unfortunately that doesn't work:
So we really have to do both together at the same time... Shame I would already be done with the eBNF part already and feel like I achieved something valuable. mmm let's try to see if we can create ONE object and then just forget about the rest. No, actually that wouldn't work, we are already creating one
Ok, let's add back the units (g, ml, cl, etc.):
And.... that doesn't work. The grammar simply does not seem to have been updated?
Adding a
So maybe the regexp syntax is not the JS syntax? Let's simplify => still no luck. So now let's copy a regexp that we know is working ( => same. Ok so the issue is likely not in the Xtext. mmm I probably forgot to run the build... let's do that.
?? What the hell is Person doing there? I am going crazy. For some reasons,
This is probably a Langium error? Ok deleting validator.ts file -> not regenerated by npm run langium:generate. This is maybe because the language (.langium) has not changed? so we change it. Nope, same story. dieta.validator.ts is not regenerated. mmm ok, so I guess I also have to understand validator files now... Go back to read me some Xtext stuff. Ok I get more or less what the hell this does and I remove all Person references while keeping the core classes empty so it does not complain. Navigating TypeScript errors ( Holding my breath.... it works. Ok so now let's modify the grammar back to what we originally wanted ( Let's try mmm so we definitely have an issue with regexp. This likely does not follow the JS syntax. Not sure what to do here, there is no example of the regexp syntax used by Langium. Or maybe this is a bug. That's all for today. |
Beta Was this translation helpful? Give feedback.
-
This continues #153 (comment) As mentioned in related comments, the previously detected issue probably comes from chevrotain lexing algorithm, given that we have feeded it with ambiguous tokenization rules (an ID could be a UNIT). The ambiguity is probably limited to tokenization as grammars are written to be unambiguous. So here either we remove ID as a terminal rule, or we remove UNIT as a terminal rule, or we remove both as terminal rules, and leave terminal rules to be, well, mere tokens, i.e. something small, simple and unambiguous. Comments for instance would not qualify as simple tokens. ADR (architecture detail record) to think about.
Given that terminal rules are regexp, this comes down to the problem of given two regexp r1 and r2, determining whether the intersection of r1 and r2 is empty or not. There is a mathematic theorem that guarantees that there is a r such that r1 ∩ r2 parses the same strings than r (cf. https://sci-hub.se/10.1145/2071368.2071372). I am not however aware of any algorithm that is able to compute the cardinality of the set of strings parsed by r. An alternative approach is to construct r and run a ton of arbitrary strings through it and check that none matches the regexp. A simpler approach is to generate arbitrary strings (Cf. https://github.com/dubzzz/fast-check) in some smart ways that exercise the r1 and r2 branches. But well none of the two approaches are exactly trivial to do that well in the general case. Food for thoughts. Moving on... |
Beta Was this translation helpful? Give feedback.
-
So the suggested changes to the grammar work like a charm:
Let's add the food item, type, and category.
We get the following error in VS:
What the hell is a datatype rule?? => https://www.google.com/search?q=data+type+rule+xtext&oq=data+type+rule+xtext&aqs=chrome..69i57.2580j0j7&sourceid=chrome&ie=UTF-8 => https://www.eclipse.org/Xtext/documentation/301_grammarlanguage.html#datatype-rules
??
Right, but it is not a normal parser rules, so still waiting for the difference. So data types rules can use hidden tokens.
Ok that is more clear as to what distinguishes data type rules: rules with no call to other parser rules, no actions, no assignments. Oh wait, I have the previous errors back again:
(note how I remove the returns from the Food_Item rule. It could not return a string anyways, but an array of strings, and it is unclear what is the syntax in the grammar declaration language for a string of array...). gives:
I thought that was working before. What happened again? ast.ts:
Let's walk it back. Turns out it was already giving webpack errors but the In other words, the webpack errors can (at least here) be ignored?? Let's add just the food item back:
No can do: So the ast.ts type error is blocking. I am stuck again. This time I really call it a day. |
Beta Was this translation helpful? Give feedback.
-
This continues #153 (comment) Ok so a confusing thing here that I am just realizing is that
So the
So, now that I run
Note that this grammar as recommended here does inline the definition of Let's try this in VS Code. Completion of the unit now works: Two interesting issues though: Alright, let's go by step. On the
That does not really make sense, at least at this position. The funny thing is that if I add a line return at the end after milk, that warning disappears? This could possibly be a bug in the generated LSP? Moving on to the next warning because I don't think there is a mistake in the grammar declaration language here:
not sure how the first warning affects the validity of the subsequent warnings (this often happens when writing incorrect code right, one mistake upstream provokes a ton of mistakes downstream). In any case, VS does not recognize milk as being a Food_Item. At this point I am unable to discriminate where the error lies. It could be the LSP, it could be the parser, it could be the lexer, and well it could be the grammar but I doubt it. Let's look at the parser (parser.ts):
The parser does not look obviously bad.
The only way to know is to test it. The good news is that Chevrotain seems to have a playground: https://chevrotain.io/playground/. Let's try to put that code in the playground. So no that's not going to work because Blocked again! We experiment with simplifying the grammar to isolate the error. Skipping failing steps, this works:
but this doesn't:
There is progress though, the first 100 is not showing a warning anymore here: but it is here: Then I get this when trying again:
So we do this:
We geT:
which is expected given the input For this input, the parser does not seem to recognize the numbers: That's all for today. My intuition here is this is probably a bug in langium more than a mistake in the grammar definition. |
Beta Was this translation helpful? Give feedback.
-
Made some progress with the following grammar:
So the issue before was probably from Chevrotain parser as was visible from the logs. The ID rule was likely conflicting with the INT rule.
Now I get this:
This time the issue seems to be with the parsing of spaces? ' g' should be parsed as a WS terminal and a unit. For some reasons that does not happen. I solved it!!!! The key was to replace in
by
This is the same issue as before in the end:
Let's now try something else: replacing with And all is swell: Let's make a food item consist of several words.
That works great: Let's add food type and category slowly:
Error is We try that:
Error is
Not sure why that is. Next try:
That compiles! However the comma is not correctly parsed NExt try: remove the +:
Same issue: Let's try this just in case:
And yes that worked... So the issue there again was that the Adding the food cateogry now:
That works!: Ok so we have done a small part of small part of our target grammar :-) Time to make a summary of findings. |
Beta Was this translation helpful? Give feedback.
-
Let's now add the kcal:
Error is Let's do this for now:
It is probably not the same semantics but let's test and see. So this is allowed:
when it should not. This also allowed:
that too:
so in the end it is not too bad. We can fine tune later. |
Beta Was this translation helpful? Give feedback.
-
Note on the documentation effort: Conceptual frameworkSelf-learning and discoveryIn any sufficiently complex software, a significant portion of users will not go and check the documentation (because the complexity of that is in proportion to that of the software), no matter how good that is. That makes it important to provide facilities for the user to discover by himself what works and what does not. In that sense, these help:
Information architectureInformation provided to the user must be organized so he discovers:
Conceptual guidesUsers must master the key terminology (universal language for the language modeling domain) that will be useful to discuss with peers and support team:
API reference
FAQ/tips
A few ideasThere go a few ideas that I believe may help drive adoption. They are quoted without any consideration of the time and effort that they may require :-) The idea is to list them all so you can see what makes sense given the direction of the project and the available resources.
A good example of the three previous points is the Nearley playground:
The input (left)/ output (right) metaphor works well with the immediate feedback to learn autonomously the grammar definition language. The separation of grammar definition (eBNF syntax) and CST transformation (standard JS) also serves to decrease the cognitive load of users who have less to learn (little new syntax), can focus on each task separately (check that the language is defined correctly, then check that the AST is correctly produced), backed by the REPL. This may or may not be possible to replicate exactly with Xtext as is but the usability characteristics that Nearley exhibits are definitely desirable to achieve. That + the specific advantages of Xtext (cross-references, LSP) would be a winning combo. Personal opinionQuick feedback as a user after toiling to implement what in my mind was a simple grammar that i would already have implemented in a few hours with other generators:
That is very preliminary and I haven't given too much thought. Leaving this here anyways for future reference. |
Beta Was this translation helpful? Give feedback.
-
This follows #153 (comment) Here we try to correct one issue we encountered before. We had to use this rule:
instead of that one:
because of the error We try to create a Kcal rule to hold the string, but the same error appears (makes sense, this is just another rule). So the only option here is to encode that string as a terminal rule, but then we will run again in the same errors due to ambiguous terminal rules... So actually no real solution here. |
Beta Was this translation helpful? Give feedback.
-
This continues #153 (comment) In the previous session, I tried to install npm 7.20, that failed three times. I did not try again and just continued as if nothing happened. In the terminal So where were we? We wanted to add the list of nutrients, for instance:
Ok that should not be too hard, provided that we do not declare A first issue we can think of is that it would be very nice to have the So how is the man to do? We could change the
Compilation is fine:
and... not much changes. So now, spaces will not be used as relevant for parsing. But the end of line will. That means we need to add it to our parsing rules:
and
Well, we don't seem to have a problem here (and we should): Maybe that's due to the '... ...' kind of illegal syntax? Let's try to insert a break somewhere else: Oh but all hell broke loose:
Let's ignore that and add a line break before
Perfect. The line breaks are not detected in long strings '... ...' but otherwise we are good. Well, except:
Not sure what's going here. NOTE TO SELF: pay attention to what is in this In any case I don't have a clue of why anything of any kind is not a valid reference ID. Let's have a look at a valid grammar from the example, see if we don't get some inspiration from there:
So having a look at [Event] here, we do not seem to do anything different, except that we don't need an array here. There is only one food category to declare. So yeah no clue. @msujew help! |
Beta Was this translation helpful? Give feedback.
-
This continues #153 (comment) I adjusted the grammar to be able to parse the end of lines correctly. The final grammar is as follows:
NOTE TO SELF: handling manually the end of lines requires some discipline. The rule here is to add the EOL token as high/early as possible in the hierarchy of rules. |
Beta Was this translation helpful? Give feedback.
-
This continues #153 (comment). Let's add comments to the language (comments start with
That of course does not work! Wait, actually I don't know why it does not work. Let's investigate: So we still have some issues to handle with the EOL. No, it is just the error message from VS Code that leads us to the wrong path: NOTE TO SELF: what to do about these errors? They are obviously nice, but how to check them so they are not misleading?? Will we have to write custom processing for everything LSP? That decreases the advantage precisely of not having to write the LSP ourselves. To think about. Anyways, we don't have our comments working. Let's go back to the double So what is the issue here? Why substituting the So the problem really is with the character
We get: Still failing... @msujew what do I do wrong here? Ok I'll try to not use a terminal tokens but add a comment rule to the grammar:
We get: ?? Let's remove the SL_COMMENT terminal rule. Ah no maybe the issue was with
That works, except the EOL: So let's do
Not sure I understand this. I should not have to add the EOL rule here, it should be picked by the top-level Model rule right? Anyways, we make some progress, it almost works: Once again remaining issue with EOL. So now I do:
And that works!! Final grammar:
|
Beta Was this translation helpful? Give feedback.
-
This continues #153 (comment) This time we want to add the ability for user to define food categories, food types, and different kind of meals. For instance: Meals:
- breakfast
- lunch
- dinner
Categories:
- Fruit
- Vegetable
- Legume
- Sweet
- Fish
- Meat
- Milk product
Types:
- Solid
- Liquid
- Liquid (cream-like)
- Liquid (sauce-like)
- Liquid (oil-like)
- Liquid (water-like) As a reminder, the current state of our grammar is as follows: grammar Dieta
hidden(WS, ML_COMMENT)
Model:
(items+=Item | types+=Type | categories+=Category | Comment | EOL)*;
Item:
quantity=INT unit=('g' | 'mg' | 'ml' | 'cl' | 'dl') 'of' name=STRING '(' type=[Type] ',' category=[Category] ')'
'provides' cal=INT 'kcal' ',' 'with' 'the' 'following' 'nutrients:' EOL (nutrients+=Nutrient EOL)*;
Type: 'type' name=ID;
Category: 'category' name=ID;
Nutrient: '-' name=ID ':' quantity=INT unit=('g' | 'mg' | 'ml' | 'cl' | 'dl');
Comment returns string: '--' ID* EOL+;
terminal WS: /[ \f\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]+/;
terminal EOL: /\r?\n/;
terminal ID: /[_a-zA-Z][\w_]*/;
terminal INT returns number: /[0-9]+(\.[0-9]+)?/;
terminal STRING: /"[^"]*"|'[^']*'/;
terminal ML_COMMENT: /\/\*[\s\S]*?\*\//; We are first going to add meals:
Works like a charm: After some reorganization, we try to add the food categories. It will interesting to see if we can correctly use the cross-reference capability of Langium: grammar Dieta
hidden(WS, ML_COMMENT)
Model:
(Meals | Categories | types+=Type | items+=Item | Comment | EOL)*;
Meals: 'Meals:' EOL meals+=Meal+;
Meal: '-' name=ID EOL;
Categories: 'Categories:' EOL categories+=Category+;
Category: '-' name=ID EOL;
Type: 'type' name=ID;
Item:
quantity=INT unit=('g' | 'mg' | 'ml' | 'cl' | 'dl') 'of' name=STRING '(' type=[Type] ',' category=[Category] ')'
'provides' cal=INT 'kcal' ',' 'with' 'the' 'following' 'nutrients:' EOL (nutrients+=Nutrient EOL)*;
Nutrient: '-' name=ID ':' quantity=INT unit=('g' | 'mg' | 'ml' | 'cl' | 'dl');
Comment returns string: '--' ID* EOL+;
terminal WS: /[ \f\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]+/;
terminal EOL: /\r?\n/;
terminal ID: /[_a-zA-Z][\w_]*/;
terminal INT returns number: /[0-9]+(\.[0-9]+)?/;
terminal STRING: /"[^"]*"|'[^']*'/;
terminal ML_COMMENT: /\/\*[\s\S]*?\*\//;
Again, works like a charm: So let's move with food types:
Well, the food type parsing itself work, the problem seems to be with parsing comments: vs. Leaving that for another day. |
Beta Was this translation helpful? Give feedback.
-
This is linked to #139
This thread aims at documenting as they occur the questions/answers/pain points faced while implementing a simple DSL. This is a Learning in public approach which is useful when designing tutorials to indentify the problems faced by users and how they solve or fail to solve them. Other methods include filming (or watching) a user as it goes through his tasks. It is simpler for me here to just note down all what comes to my mind (stream of consciousness).
The DSL is as follows:
Objectives
This DSL is created to illustrate the steps to create a DSL with Langium and its associated tooling. The DSL is chosen to be reasonably realistic and simple rather than to exercise many parts of the Langium ecosystem.
Language goals
The language aims at supporting individuals that are interested in following a diet. The DSL supports:
In the first draft of this DSL, we will only pursue the first two goals and the last one.
Language definition
Aliments and their nutrition profile
Aliments may have a type and category (legumes, liquid). Aliments provide a given number of calories and have a nutritional profile.
The nutritional profile should logically have mandatory components (proteins, carbs, fats) but also optional user-defined components (fiber, etc.).
If the nutrient information is not available for a category, a ? can be used to indicate that unavailability. The list of available measuring units should be fixed (g, mg, ml, cl, dl, l). Unless otherwise indicated, 1l is converted to 1kg.
An example is as follows:
Daily food intake
The daily food intake section of the DSL lets the user declare his food intake across the meals of a given day. Meals are a list of aliments whose portion/quantity can be specified. To make it more user-friendly and closer to natural language, we define two distinct formats to enter that list:
An example is as follows:
Note that the existence of two formats is creating edge cases that will have to be handle. Such is the case when on the same day, for the same meal a user defines distinct food intakes in distinct formats. The food intakes should be summed appropriately nonetheless.
User configuration
There should be defaults so that this is rather infrequent.
An example is as follows:
Beta Was this translation helpful? Give feedback.
All reactions