-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: More UTF-8 string functions #13
Comments
left() and right() are special cases of mid(), so to keep the API lean, I'd just implement a kind of substring function like mid(). upper() and lower() are tough ones because the usual locale handling is platform-specific (e.g. setlocale()'s support for UTF-8 is not guaranteed). I'd probably defer to other libraries which specifically handle Unicode, unfortunately. |
Yeah okay, that sounds reasonable I guess. It would have been great for me personally to have all these functions in one place so I would not have to include a bunch of different files every time I work with UTF-8 text, but I also understand that you do not want to bloat your project too much with these kind of things. As for the cases, I have been using this myself to convert Russian UTF-8 text between lower and upper case:
And to get the upper case you would just do -= 32 instead. But I guess this only works for plain English or Russian UTF-8 letters/characters. |
I'm not sure about the specification of Unicode and if it always separates upper/lower pairs by 32 for all languages. That'd be really great if it did! One clear exception to that and a few other rules is the German (but not for all German locales) ß, which has no actual capitalization, but can sometimes be rendered in caps as SS. This is a single character that would map to 2. :-/ Regardless, if someone did contribute a robust capitalization scheme, even just for a couple of locales, I'd put it in and accept contributions to fill it out further. |
Would it be possible to add a few more utility functions for dealing with UTF-8 text?
These are some of the functions that already exist in SDL_FontCache:
U8_strlen
U8_charsize
U8_charcpy
U8_next
U8_strinsert
U8_strdel
I would love to also see something like this:
U8_mid
U8_left
U8_right
U8_upper
U8_lower
These kind of functions can be found in a lot programming languages and it is should be fairly obvious what they do. As there are no way to do this with UTF-8 text in C or C++ by default, I think this would be a great additional feature of SDL_FontCache.
I have no idea how difficult it is to convert between upper and lower case UTF-8, but I think at least the mid, left and right functions should not require too much code since you already have functions such as U8_next.
The text was updated successfully, but these errors were encountered: