Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore updated categories (specifically for Rascal syntax highlighting) #6

Merged
merged 4 commits into from
Jul 16, 2024

Conversation

sungshik
Copy link
Collaborator

To generate a TextMate grammar for Rascal's own grammar, there are two alternatives for mapping Rascal categories to TextMate scope names:

  • Use the current indirect mapping -- via semantic token types -- which is used by the semantic highlighter
  • Use the previous direct mapping, which is not used anymore (replaced by the indirect mapping)

In principle, the direct mapping can be more precise as it doesn't need to "fit" the abstraction of semantic token types. Thus, in PR #1, we decided to use the direct mapping (discussion).

However, because the TextMate highlighter and the semantic highlighter are now based on different mappings, their results are inconsistent. Notably, string constants (which are assigned category Constant in Rascal's own grammar) are colored differently by the two highlighters (it's mapped to string using the indirect mapping, but to constant using the direct mapping).

For now, this PR restores the use of the indirect mapping instead of the direct mapping when generating a TextMate grammar for Rascal's own grammar. (Grammars for other languages, which presumably define their own mapping from categories to scope names, are unaffected by this PR.)

… consistent with semantic highlighter (for Rascal code)
@sungshik sungshik requested a review from DavyLandman July 12, 2024 13:19
case \tag("category"("Identifier")) => \tag("category"("variable"))
case \tag("category"("Variable")) => \tag("category"("variable"))
case \tag("category"("Constant")) => \tag("category"("constant"))
case \tag("category"("Constant")) => \tag("category"("string")) // Updated (before: constant)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making all constant string is not helping. We should do something in the rascal highlighter to fix the underlying issue. Loosing precision here is not so nice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making all constant string is not helping

The only productions with the @category("Constant") tag in Rascal's own grammar are string-related, so I don't think precision is actually lost here for Rascal's own grammar. (See the other comment for a full overview of the usage of @category tags.)

So, in the context of Rascal's own grammar, I can understand why the semantic highlighter maps Constant to string.

fix the underlying issue

I think the best way forward is to update the categories in Rascal's own grammar (which are "legacy token types") to directly use the intended TextMate scopes. For instance, each occurrence of @category("Constant") would then be replaced with @category("string") (or, for even more precision, @category("string.quoted.double") and @category("string.quoted.single"), depending on the production).

However, if it's logistically complicated to update Rascal's own grammar right now, and/or release such an update in the near future, we could also temporarily use a preprocessor to rewrite categories when generating a TextMate grammar -- only for Rascal's own grammar (other grammars unaffected). That's essentially this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I assumed numbers would also get a constant assigned? that is weird. We might have to introduce them (maybe via this rewrite logic to add a category for certain productions?)

Copy link
Member

@DavyLandman DavyLandman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope we can find a different way to fix this. maybe let the semantic tokenizer do less.

@sungshik
Copy link
Collaborator Author

For reference, here's an overview of all non-terminals in Rascal's own grammar that have a category.

@category("Constant")

lexical PreStringChars
  = @category="Constant" [\"] StringCharacter* [\<] ;

lexical MidStringChars
  = @category="Constant" [\>] StringCharacter* [\<] ;

lexical PostStringChars
  = @category="Constant" [\>] StringCharacter* [\"] ;

lexical StringConstant
  = @category="Constant" "\"" StringCharacter* chars "\"" ;

lexical CaseInsensitiveStringConstant
  = @category="Constant" "\'" StringCharacter* chars "\'" ;

lexical Char
  = @category="Constant" "\\" [\  \" \' \- \< \> \[ \\ \] b f n r t] 
  | @category="Constant" ![\  \" \' \- \< \> \[ \\ \]] 
  | @category="Constant" UnicodeEscape ;

Note: There are only string/char constants; no number constants.

@category("Comment")

lexical Comment
  = @category="Comment" "/*" (![*] | [*] !>> [/])* "*/" 
  | @category="Comment" "//" ![\n]* !>> [\ \t\r \u00A0 \u1680 \u2000-\u200A \u202F \u205F \u3000] $ ;

syntax Tag
  = @Folded @category="Comment" \default   : "@" Name name TagString contents 
  | @Folded @category="Comment" empty     : "@" Name name 
  | @Folded @category="Comment" expression: "@" Name name "=" Expression expression !>> "@" ;

Other

lexical ConcretePart
  = @category="MetaSkipped" text   : ![`\<\>\\\n]+ !>> ![`\<\>\\\n]
  | newline: "\n" [\ \t \u00A0 \u1680 \u2000-\u200A \u202F \u205F \u3000]* "\'"
  | @category="MetaVariable" hole : ConcreteHole hole
  | @category="MetaSkipped" lt: "\\\<"
  | @category="MetaSkipped" gt: "\\\>"
  | @category="MetaSkipped" bq: "\\`"
  | @category="MetaSkipped" bs: "\\\\" ;

lexical Output   
  = @category="Result" resultOutput: "⇨" ![\n\r]* [\n] 
  | @category="StdOut" stdoutOutput: ^ "≫" ![\n\r]* [\n]
  | @category="StdErr" stderrOutput: ^ "⚠" ![\n\r]* [\n] ;

@DavyLandman
Copy link
Member

I think we should dynamically add some categories to productions in the grammar. Just because the base grammar is harder to change. And then maybe when this is happening: usethesource/rascal#1928 improve the categories itself.

@sungshik
Copy link
Collaborator Author

I think we should dynamically add some categories to productions in the grammar. Just because the base grammar is harder to change. And then maybe when this is happening: usethesource/rascal#1928 improve the categories itself.

Agreed on this course of action; the latest commits bring in the additional hot-patching for numbers, strings, and locations.

@sungshik sungshik merged commit 23cca94 into main Jul 16, 2024
2 checks passed
@sungshik sungshik deleted the restore-updated-categories branch July 16, 2024 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants