Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Unicode escape sequences in characters #4

Open
Mesabloo opened this issue Aug 30, 2020 · 1 comment
Open

Support Unicode escape sequences in characters #4

Mesabloo opened this issue Aug 30, 2020 · 1 comment
Labels
discussion When you want to discuss about something enhancement New feature or request parsing Is the syntax parsed well enough ?
Milestone

Comments

@Mesabloo
Copy link
Member

Mesabloo commented Aug 30, 2020

Characters do not currently accept unicode escape characters in the shape of \uHHHH and \uHHHHHHHH (where H is any hexadecimal number).

This could be a good thing to support those. However, this may complicate typechecking a little bit. What is the size of a unicode character? How well would it integrate with compilation? Will there be any currently known problems with supporting Unicode characters?

Those questions need to be answered first. This will (or not) be worked on after that step.


Characters may also suffer from the same bug as #3. This will also need to be fixed if needed.

@Mesabloo Mesabloo added enhancement New feature or request good first issue Good for newcomers parsing Is the syntax parsed well enough ? discussion When you want to discuss about something labels Aug 30, 2020
@Mesabloo Mesabloo added this to the N* v1 milestone Aug 30, 2020
@Mesabloo Mesabloo modified the milestones: N* v1, N* v2 Sep 23, 2020
@Mesabloo Mesabloo removed this from the N* v3 milestone Mar 14, 2021
@Mesabloo Mesabloo moved this to Todo in Issue tracker Nov 1, 2021
@Mesabloo Mesabloo moved this from Todo to Discussing in Issue tracker Nov 1, 2021
@Mesabloo
Copy link
Member Author

What is the size of a unicode character?

Languages supporting unicode characters out of the box use 32-bit integers to encode characters. This is a little wasteful in most scenarios, but this allows to store any Unicode character.

How well would it integrate with compilation?

An opaque builtin char type should do the job pretty much. We can treat it as a u32 or s32 when compiling.

Will there be any currently known problems with supporting Unicode characters?

Some functions from the C standard library (e.g. strlen etc) do not count in unicode codepoints, but rather offsets of 8 bits.
However, this is technically not the problem of N⋆.

@Mesabloo Mesabloo added this to the Later milestone Sep 24, 2022
@Mesabloo Mesabloo removed the good first issue Good for newcomers label Sep 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion When you want to discuss about something enhancement New feature or request parsing Is the syntax parsed well enough ?
Projects
Status: Discussing
Development

No branches or pull requests

1 participant