-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separating the C and CN parsing #342
base: master
Are you sure you want to change the base?
Conversation
I'm keeping `c_parser.mly` and `c_lexer.mll` unchanged for now to avoid destroying `c_parser_error.messages` and to keep CN in a working state.
This does not duplicate the C grammar, instead when the CN lexer sees matching `<` ... `>` it attempts to parse a C type-name between the angles (using a new entry point to the C parser) and produces a single `LT_CTYPE_GT` token for the CN parser. This may fail because the opening `<` may in fact be a relational operator. In this case the CN lexer recovers by re-lexing from the top. This first attempt avoids changing the existing CN syntax. If we instead change it to have dedicated delimiters around C type-names, we won't need to do the re-lexing recovery. Most of the CN ci tests now work (the failures are caused by the current lack of pretty error messages in the CN parser), but this needs proper testing...
Thanks very much! AFAICS |
Regarding the question of parsing C-types within CN: maybe the lexer hack is less problematic than it seems: there's a small set of language constructs only that can take C-types ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm, though this ctype issue is gonna be annoying.
let hexadecimal_constant = hexadecimal_prefix hexadecimal_digit+ | ||
|
||
(* C23 binary constant, but omitting ' separators for now *) | ||
(* TODO(Christopher/Dhruv): add support for ' separators to be in line with C23 ? *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes please! #337
(* TODO(Christopher/Dhruv): do you care about following C11 closely for this | ||
(e.g. with respect to the universal character stuff), or should we simplify | ||
like we already do for uppercase names. *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vote simplify - we can always add it later if requested.
| '<' ([^'='][^'<' '>']+ as str) '>' | ||
{ | ||
try | ||
LT_CTYPE_GT (C_parser.type_name_eof C_lexer.lexer (Lexing.from_string str)) | ||
with | ||
| _ -> | ||
relexbuf_opt := Some (Lexing.from_string (str ^ ">")); | ||
LT | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an amazing hack. But I do worry it might be a bit too hacky. Can it be more conservative? Specifically, I wanted to have <-
available as a token. We do need to discuss this properly though @cp526
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of ...<...>
was to have a C++-like syntax for type parameters, which would be good to keep if we can. What's the use of <-
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think types always start with a letter (or maybe _
), so you could probably change it to only try for a type, if the <
is followed by one of these. I think should allow you to have <-
later if you wanted it.
Otoh, I don't think this would work if you ever wanted types that have instantiations in them, but we probably don't have those at the moment.
"boolean", BOOL; (* TODO(Christopher/Dhruv): are all these variants of BOOL needed? *) | ||
"CN_bool", BOOL; (* TODO(Christopher/Dhruv): are all these variants of BOOL needed? *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two can go.
We're still using |
Yes, that's true. |
"bool", BOOL; (* shared with C23 *) | ||
"boolean", BOOL; (* TODO(Christopher/Dhruv): are all these variants of BOOL needed? *) | ||
"CN_bool", BOOL; (* TODO(Christopher/Dhruv): are all these variants of BOOL needed? *) | ||
"cn_function", FUNCTION; (* TODO(Christopher/Dhruv): is this variant still needed? *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This stays.
This creates a separate lexer and parser for CN.
There are a few questions for @cp526 and @dc-mak marked with
TODO(Christopher/Dhruv)
in the code.And there is a hack for dealing with the parsing of C type-names discussed in the commit message of c96a1ad.
This is still work-in-progress, the CN parser doesn't produce pretty messages so some CI and tutorial tests fail.