Lexical syntax of names #201

BinderDavid · 2024-05-08T13:46:18Z

Previously we just had one single declaration in lang/parser/src/cst/exp.rs.

pub type Ident = String;

This definition was used for variables, constructors, destructors, type constructors, etc.
This is no longer tenable if we want to introduce modules and qualified names. We need more structure.

I propose something like the following:

-- Names for variables: "x", "k"
<VarName>  := <lowercase> (<lowercase> | <uppercase>) * 
-- Names for uppercase identifiers: "Cons", "List", "Stream"
<UIdName> := <uppercase> (<lowercase> | <uppercase>)*
-- Names for lowercase identifiers: "ap", "hd", "tl"
<LIdName> :=  <lowercase> (<lowercase> | <uppercase>)*
-- Names for modules:  "control", "std", "function"
<MName> := <lowercase> (<lowercase> | <uppercase>)
-- Names for qualified uppercase identifiers:  "Cons", "std::List", "std::List::Cons"
<QUIdName> := (<MName>"::")*(<UIdName>"::")?<UIdName>
-- Names for qualified lowercase identifiers:  "std::Fun::ap" "Fun::ap"
<QLIdName> :=  (<MName>"::")*(<UIdName>"::")?<LIdName>

I propose that for typeconstructors, constructors and codefinitions we may only use UIdName, and for destructors and and definitions only LIdName. This would also allow to distinguish variables from calls with no arguments in the parser, i.e. x vs Nil.

The text was updated successfully, but these errors were encountered:

timsueberkrueb · 2024-05-08T20:10:16Z

Regarding syntax: Following the Rust conventions, I would advocate for allowing underscores everywhere, and encourage (but maybe not enforce) snake_case over camelCase for definitions and destructors.
Regarding implementation: For better error messages, it is probably a good idea to keep the parser as permissive as possible and only distinguish these syntactic categories during lowering/name resolution?

BinderDavid · 2024-05-08T20:32:02Z

I agree that the parser should be permissive in general, but there are a few things that we should distinguish early in order to make our life easier later on. For example, I don't want the parser to allow qualified names in binding positions of both names and variables. E.g. I would expect the parser to reject:

data foo::Bar { bizz::Fizz( foo::x : Type,...) }
     ^^^^^^^^   ^^^^^^^^^^  ^^^^^^
          (a)      (b)        (c)

Here (a) and (b) are binding positions for names, and (c) is a binding position for a variable. The parser should reject them all. In those positions we only want to allow unqualified identifiers.

The other thing is whether we want to enforce all module names to be lower case and all type names to be upper case.
This would allow us to distinguish already in the parser whether a name is qualified by a type vs a module. Compare:

std::result::Result (Here, std and result must be module names.)
std::result::Result::Ok (Here, std and result must be module names, and Result a type name)
std::function::Fun::ap (Here std and function are module names, Fun is a type and ap a destructor.
std::function::compose (Here std and function are module names, compose is a toplevel definition)

timsueberkrueb · 2024-05-08T20:39:45Z

Yeah we should enforce lowercase module names but it may be easier to allow for multiple error messages in lowering/name resolution than making parse errors recoverable. It would certainly be more fine-grained.

BinderDavid mentioned this issue May 8, 2024

Make ident abstract #202

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lexical syntax of names #201

Lexical syntax of names #201

BinderDavid commented May 8, 2024

timsueberkrueb commented May 8, 2024

BinderDavid commented May 8, 2024 •

edited

Loading

timsueberkrueb commented May 8, 2024

Lexical syntax of names #201

Lexical syntax of names #201

Comments

BinderDavid commented May 8, 2024

timsueberkrueb commented May 8, 2024

BinderDavid commented May 8, 2024 • edited Loading

timsueberkrueb commented May 8, 2024

BinderDavid commented May 8, 2024 •

edited

Loading