-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Message.Entities returns Length of UTF16 encoded string, not UTF8 supported by Golang #231
Labels
Comments
42wim
added a commit
to 42wim/matterbridge
that referenced
this issue
Jul 6, 2019
…#857 Besides the bound checking, this now also use utf16 as suggested by go-telegram-bot-api/telegram-bot-api#231
42wim
added a commit
to 42wim/matterbridge
that referenced
this issue
Jul 8, 2019
…#857 (#858) Besides the bound checking, this now also use utf16 as suggested by go-telegram-bot-api/telegram-bot-api#231
zeridon
pushed a commit
to zeridon/matterbridge
that referenced
this issue
Feb 12, 2020
…42wim#857 (42wim#858) Besides the bound checking, this now also use utf16 as suggested by go-telegram-bot-api/telegram-bot-api#231
If you want convert entities |
Discord |
The issue is also present internally, on the |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
How I discovered it
I wanted to get the text + emoji that contained a particular link, but I always got the right Offset with a wrong Length (which is correct for UTF16, but not for my original string in UTF8).
Telegram uses UTF16 encoding for calculating Length and Offset so when just ASCII text is used there are no problems at all, since ASCII always uses 1 byte for each character. Once an Emoji is used, due to emojis different sizes, the calculation starts to be wrong.
How I solved this particular problem
I used the unicode/utf16 library in order to encode the original text, extract the text I wanted and then convert it to a UTF8 string again.
The Code
Given update of Update type, I wanted to extract each text with an embedded link by using Entities attribute.
The original message was "➡️Click Me⬅️ or ➡️Click Me⬅️" with "https://www.example.com/" embedded on both (just as a test).
Not Working Code
Using the following code (not using unicode/utf16):
Output
As you can see the second Emoji of the first element isn't just there, while the second element is just broken.
Working Code
The following is a piece of code that totally works (using unicode/utf16):
Output
Elements are just as they should be.
Conclusion
As you can see the Offset and Length are always the same and are actually correct when using UTF16.
Hope it will help anyone having the same issue!
The text was updated successfully, but these errors were encountered: