Support Unicode escape sequences in characters #4

Mesabloo · 2020-08-30T16:05:37Z

Characters do not currently accept unicode escape characters in the shape of \uHHHH and \uHHHHHHHH (where H is any hexadecimal number).

This could be a good thing to support those. However, this may complicate typechecking a little bit. What is the size of a unicode character? How well would it integrate with compilation? Will there be any currently known problems with supporting Unicode characters?

Those questions need to be answered first. This will (or not) be worked on after that step.

Characters may also suffer from the same bug as #3. This will also need to be fixed if needed.

The text was updated successfully, but these errors were encountered:

Mesabloo · 2022-07-13T08:20:59Z

What is the size of a unicode character?

Languages supporting unicode characters out of the box use 32-bit integers to encode characters. This is a little wasteful in most scenarios, but this allows to store any Unicode character.

How well would it integrate with compilation?

An opaque builtin char type should do the job pretty much. We can treat it as a u32 or s32 when compiling.

Will there be any currently known problems with supporting Unicode characters?

Some functions from the C standard library (e.g. strlen etc) do not count in unicode codepoints, but rather offsets of 8 bits.
However, this is technically not the problem of N⋆.

Mesabloo added enhancement New feature or request good first issue Good for newcomers parsing Is the syntax parsed well enough ? discussion When you want to discuss about something labels Aug 30, 2020

Mesabloo added this to the N* v1 milestone Aug 30, 2020

Mesabloo mentioned this issue Sep 2, 2020

Merge working parser into master #7

Merged

Mesabloo modified the milestones: N* v1, N* v2 Sep 23, 2020

Mesabloo removed this from the N* v3 milestone Mar 14, 2021

Mesabloo added this to Issue tracker Nov 1, 2021

Mesabloo moved this to Todo in Issue tracker Nov 1, 2021

Mesabloo moved this from Todo to Discussing in Issue tracker Nov 1, 2021

Mesabloo added this to the Later milestone Sep 24, 2022

Mesabloo removed the good first issue Good for newcomers label Sep 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Unicode escape sequences in characters #4

Support Unicode escape sequences in characters #4

Mesabloo commented Aug 30, 2020 •

edited

Loading

Mesabloo commented Jul 13, 2022

Support Unicode escape sequences in characters #4

Support Unicode escape sequences in characters #4

Comments

Mesabloo commented Aug 30, 2020 • edited Loading

Mesabloo commented Jul 13, 2022

Mesabloo commented Aug 30, 2020 •

edited

Loading