-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Machine readable syntax specification #3707
base: master
Are you sure you want to change the base?
Conversation
Cadence Benchstat comparisonThis branch with compared with the base branch onflow:master commit 24783e3 Collapsed results for better readability
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
|
||
### Set Rule | ||
|
||
Set rules consist of a sequence of intervals separated by a comma (`,`). These intervals can either represent a single Unicode code point or a range of code points, separated by a hyphen (`-`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't alternations with |
also sets? Could we maybe just consistently use |
to avoid unnecessarily adding a new operator uncommon in other notions
; | ||
|
||
<identifierToken>: | ||
"ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz" *identifierFollower |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to express these as ranges? e.g. [A-Z_a-z]
like in flex?
|
||
* | ||
program: | ||
*declaration( declarationSeparator ) $ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make it more explicit that the notation here means "separated by"? The parentheses look like grouping (like in other notations), and at first glance it looks like they are unnecessary and can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This language does not have a grouping operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, could you please leave a comment above, with an equivalent rule in somehting like EBNF notation? Same for the comments above (adding a comment for ranges)
access contractKeyword interfaceKeyword nonReservedIdentifier *1interfaceInheritance "{" membersAndNestedDeclarations "}" | ||
; | ||
|
||
enumInterfaceDeclaration: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to indicate that enum interfaces are illegal, but this rule exists to produce better error messages. We might in general make this syntax definition as strict as possible, so maybe just remove the rule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking we should have two versions of the syntax. A strict grammar that is allowed by the compiler, and a more general, internal syntax that is used to produce useful error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. To begin with we might at least indicate which variant of the syntax this is
accessPriv | ||
| accessPub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub
, pub(set)
, and priv
got removed in Cadence 1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This syntax is focused on duplicating the behavior of the interpreter's parser. These would produce errors in the semantic checks. That should also be part of the specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe at least leave a comment documenting this decision, e.g. something like // Invalid since Cadence 1.0, so not part of the strict syntax
; | ||
|
||
fullTypeOptional: | ||
. optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .
looks like "any character" here, like in other notations. Maybe leave a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .
and _
were chosen to represent adjacent and not-adjacent. The language for the parser section does not match individual characters. It matches tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that's OK. It might not be obvious to readers unfamiliar with this custom notation, and given the .
notation means something else in other commonly used notations, maybe help the reader here and point out that this does not mean "any character" but "adjacent"
move | ||
; | ||
|
||
expression: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The precedence numbers are an implementation detail of the current parser implementation, maybe remove them from the document as they are not necessary and probably just confusing. The rules are already structured in a way where they encode the precedence rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the intention. They were there only to help define the grammar. I forgot to remove them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did another pass over types as requested. Mainly the precedence rules need to be encoded properly, and the only other thing is the intersection types.
Also did another pass over all declarations, and left a few minor comments, especially regarding what types are allowed.
Will do another pass over expressions tomorrow.
; | ||
|
||
functionDeclaration: | ||
access *1viewKeyword funKeyword nonReservedIdentifier parameterList *1functionDeclarationTypeAnnotation *1functionBlock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and above: The functionBlock
has an "optional" multiplicity here, but this is only the case for some situations, e.g. in interface declarations. Otherwise it is required. We can leave this as-is, but maybe leave a note
; | ||
|
||
parseLabeledParameter: | ||
parameterPublicName parameterLocalName ":" typeAnnotation *1defaultValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename parameterPublicName
to argumentLabel
, and parameterLocalName
to just parameterName
, to match the terminology we have elsewhere
; | ||
|
||
parseLabeledParameter: | ||
parameterPublicName parameterLocalName ":" typeAnnotation *1defaultValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The defaultValue
is only valid in some cases, in resource destroyed event declarations. Maybe leave a note that we parse more than semantically allowed
|
||
conformances: | ||
conformancesTypes | ||
| conformancesEmpty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be simplified to just | empty
, do we need a separate empty rule conformancesEmpty
?
; | ||
|
||
structInterfaceDeclaration: | ||
access structKeyword interfaceKeyword nonReservedIdentifier *1interfaceInheritance "{" membersAndNestedDeclarations "}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does interfaceInheritance
differ from conformances
? Maybe remove *1interfaceInheritance
and just use conformances
here and for the other interface rules below
|
||
escapedCharacter: | ||
escapedUnicodeScalar | ||
| escapedCCharacter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential typo (extra C
), maybe rename to escapedCharacter
; | ||
|
||
functionType: | ||
funKeyword "(" *1functionTypeParameters ")" *1functionTypeReturn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function types may have a view
modifier
fullType: | ||
fullTypeNormal | ||
| fullTypeNested | ||
; | ||
|
||
fullTypeNormal: | ||
*1fullTypeReference innerType *fullTypeOptional | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe encode the precedence rules for types in nested rules, like it is done for expressions, see the typeLeftBindingPower*
constants. For example, references bind tighter than optionals
| dictionaryType | ||
; | ||
|
||
typeRestrictions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restriction types got removed in Cadence 1.0. Instead, there are only intersection types, which have this syntax, but not as a "suffix" to a type. For example, before a type T
could be restricted with T{...}
, where ...
is a list of zero or more nominal types. Now there is just the syntax for intersection types {...}
where ...
is a list of one or more nominal types.
; | ||
|
||
interfaceInheritance: | ||
":" 1*fullType( "," ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only nominal types are allowed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did another pass reviewing the expression parsing. Looks good, a couple notes.
Note that some expressions need newline differentiation to resolve ambiguities when expressions are used as statements, see the "(* if no line terminator ahead *)" comments in the expression rules of the EBNF grammar, like
Line 457 in 5364220
| postfixExpression (* if no line terminator ahead *) invocation |
For example:
foo(bar) // invocation
foo // identifier expression
(bar) // parenthesized expression
; | ||
|
||
comparisonExpression: // 50 | ||
1*2nilCoalescingExpression( comparisonOp ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though type checking might reject nested comparison expressions, parsing still allows them.
So this should not have an upper bound in the multiplicity:
1*2nilCoalescingExpression( comparisonOp ) | |
1*nilCoalescingExpression( comparisonOp ) |
primaryExpression: | ||
nullDenotation *leftDenotation | ||
; | ||
|
||
nullDenotation: | ||
identifierExpression | ||
; | ||
|
||
leftDenotation: | ||
forceUnwrapOp | ||
| accessExpression | ||
| invocationExpression | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"null denotation" and "left denotation" are terminology of Pratt's Top Down Operator Precedence parsing technique, i.e. are an implementation detail of the current Go parser. They shouldn't be used as rule names.
*1prefixOp postfixExpression | ||
; | ||
|
||
postfixExpression: // 160 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Force unwrapping is on this precedence level
leftDenotation: | ||
forceUnwrapOp | ||
| accessExpression | ||
| invocationExpression | ||
; | ||
|
||
accessExpression: // 170 | ||
memberExpression | ||
| indexExpression | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
member expression, indexing expression, and invocation expression are all part of the same precedence and should probably be in the same rule. As noted above, force unwrapping has lower precedence (needs to be in a "higher" rule)
Closes #3686
Description
Added a machine readable syntax specification for the Cadence language and a Syntax Notation document.
master
branchFiles changed
in the Github PR explorer