Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexical syntax of names #201

Open
BinderDavid opened this issue May 8, 2024 · 3 comments
Open

Lexical syntax of names #201

BinderDavid opened this issue May 8, 2024 · 3 comments

Comments

@BinderDavid
Copy link
Collaborator

Previously we just had one single declaration in lang/parser/src/cst/exp.rs.

pub type Ident = String;

This definition was used for variables, constructors, destructors, type constructors, etc.
This is no longer tenable if we want to introduce modules and qualified names. We need more structure.

I propose something like the following:

-- Names for variables: "x", "k"
<VarName>  := <lowercase> (<lowercase> | <uppercase>) * 
-- Names for uppercase identifiers: "Cons", "List", "Stream"
<UIdName> := <uppercase> (<lowercase> | <uppercase>)*
-- Names for lowercase identifiers: "ap", "hd", "tl"
<LIdName> :=  <lowercase> (<lowercase> | <uppercase>)*
-- Names for modules:  "control", "std", "function"
<MName> := <lowercase> (<lowercase> | <uppercase>)
-- Names for qualified uppercase identifiers:  "Cons", "std::List", "std::List::Cons"
<QUIdName> := (<MName>"::")*(<UIdName>"::")?<UIdName>
-- Names for qualified lowercase identifiers:  "std::Fun::ap" "Fun::ap"
<QLIdName> :=  (<MName>"::")*(<UIdName>"::")?<LIdName>

I propose that for typeconstructors, constructors and codefinitions we may only use UIdName, and for destructors and and definitions only LIdName. This would also allow to distinguish variables from calls with no arguments in the parser, i.e. x vs Nil.

@timsueberkrueb
Copy link
Collaborator

Regarding syntax: Following the Rust conventions, I would advocate for allowing underscores everywhere, and encourage (but maybe not enforce) snake_case over camelCase for definitions and destructors.
Regarding implementation: For better error messages, it is probably a good idea to keep the parser as permissive as possible and only distinguish these syntactic categories during lowering/name resolution?

@BinderDavid
Copy link
Collaborator Author

BinderDavid commented May 8, 2024

I agree that the parser should be permissive in general, but there are a few things that we should distinguish early in order to make our life easier later on. For example, I don't want the parser to allow qualified names in binding positions of both names and variables. E.g. I would expect the parser to reject:

data foo::Bar { bizz::Fizz( foo::x : Type,...) }
     ^^^^^^^^   ^^^^^^^^^^  ^^^^^^
          (a)      (b)        (c)

Here (a) and (b) are binding positions for names, and (c) is a binding position for a variable. The parser should reject them all. In those positions we only want to allow unqualified identifiers.

The other thing is whether we want to enforce all module names to be lower case and all type names to be upper case.
This would allow us to distinguish already in the parser whether a name is qualified by a type vs a module. Compare:

  • std::result::Result (Here, std and result must be module names.)
  • std::result::Result::Ok (Here, std and result must be module names, and Result a type name)
  • std::function::Fun::ap (Here std and function are module names, Fun is a type and ap a destructor.
  • std::function::compose (Here std and function are module names, compose is a toplevel definition)

@timsueberkrueb
Copy link
Collaborator

Yeah we should enforce lowercase module names but it may be easier to allow for multiple error messages in lowering/name resolution than making parse errors recoverable. It would certainly be more fine-grained.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants