-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thai character set order misalignment when parsing or writing string #30683
Comments
Hi @chrisbloe! Thanks for reporting this. In the Terraform language, strings are treated as "unicode text" and so subject to normalization per UAX #15 so that Terraform will treat as equal any two strings that have the same normalization. This behavior is particularly important in situations where Terraform needs to compare strings for its own work, such as deciding whether a string given in the configuration is equal to a string returned by the remote system which might encode the same characters using different (but equivalent) sequences of unicode code points. Although I'm not familiar with these characters in particular, from how they are rendered it seems like they are characters intended to combine with what comes before them, and characters of that class are the ones typically most affected by unicode text normalization1, because in practice the preceding code point and all of the combining code points that follow are a single "user-perceived character" as far as Unicode is concerned, and so from a unicode text perspective there is no significance to which order the individual combining codepoints combine. Given that, I think Terraform is working as intended here (indirectly: it is relying on Unicode specifications which themselves seem to intend this behavior), but it raises the question of what you could've done differently here in order to preserve your input byte-for-byte rather than interpreting it as unicode text, because your source file seems to be intentionally representing particular non-canonical Unicode sequences for another piece of software which presumably implements Unicode specifications (or something similar) itself. The typical way we represent raw bytes in the Terraform language is as base64-encoded strings, and the function However, the content_base64 = filebase64("${path.module}/files/stuff/${source.value}") If we take that approach then it would be a change in the repository of the Thanks again for reporting this! 1 UAX #15 Section 1.3 includes the following, which I think is the most relevant part to explain the behavior you observed:
Following this specification's terminology, a different way to understand my statement above is that I think the characters you identified here are "non-starters" and therefore subject to reordering during normalization. We should be able to confirm that by referring to how those characters are annotated in UnicodeData.txt. (The Canonical Combining Classes.) |
Thanks for such a quick reply! It seems this issue is a red herring for another problem I've been having, but your answer may well help others if they run up against this in the future. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Terraform Version
I'm running on Ubuntu under WSL:
Terraform Configuration Files
The following
content
variable contains 3 Thai characters:When pasted it should look like this (e.g. Notepad++):
Debug Output
I haven't done this, but it's fairly trivial to reproduce.
Expected Behavior
The 3 characters should stay in the same order.
Actual Behavior
The last character moves to the front of the text:
Steps to Reproduce
terraform init
terraform apply
Additional Context
I discovered this by doing some testing with
archive_file
as seen below. One of the files I was zipping waslangthaimodel.py
(see References) - I narrowed it down to the three characters included in the above example.I tested with
local_file
and saw it had the same problem, so I think this is a problem with Terraform itself, or a library or platform, and not a specific provider.References
The text was updated successfully, but these errors were encountered: