Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape default arrays and sequences the same other default values #620

Open
wants to merge 1 commit into
base: rolling
Choose a base branch
from

Conversation

jacobperron
Copy link
Member

Fixes #610

@jacobperron
Copy link
Member Author

As a regression test, I've added a new field to one of the test messages: ros2/test_interface_files#16

@jacobperron
Copy link
Member Author

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@jacobperron
Copy link
Member Author

Rpr fails because ament/ament_cmake#352 has not been released.

@sloretz
Copy link
Contributor

sloretz commented Sep 30, 2021

This is interesting. I can't predict if it will pass the test_communication tests. I recommend adding another test to test_interface_files with "ハローワールド" to see what it does when given a string with characters outside the latin-1 character set.

https://github.com/ros2/test_interface_files/blob/ea4d4f33eca97f37b4294e6ab012fa0f216de609/msg/WStrings.msg#L2-L4

IIUC Hellö wörld!" in that .msg file is encoded as the UTF-8 bytes b'Hell\xc3\xb6 w\xc3\xb6rld!'. When read as latin-1 the string in Python would be very different from the original.

>>> b'Hell\xc3\xb6 w\xc3\xb6rld!'.decode('latin-1')
'Hellö wörld!'

When re-encoded as latin-1 that should output the original bytes, but there's some more weirdness when the generators read the idl files.

return codecs.decode(match.group(0), 'unicode-escape')

@jacobperron
Copy link
Member Author

IIUC Hellö wörld!" in that .msg file is encoded as the UTF-8 bytes b'Hell\xc3\xb6 w\xc3\xb6rld!'. When read as latin-1 the string in Python would be very different from the original.

In fact there were test failures related to this that I missed locally: https://ci.ros2.org/job/ci_linux/15332/testReport/junit/rosidl_generator_py.test/test_interfaces/test_wstrings/

@jacobperron jacobperron self-assigned this Oct 5, 2021
Fix #610

Apply the same encode/decode pattern and escaping as for other default values.

Signed-off-by: Jacob Perron <[email protected]>
@jacobperron jacobperron changed the title Use latin-1 encoding for reading interface file content Escape default arrays and sequences the same other default values Oct 14, 2021
@jacobperron
Copy link
Member Author

I think the issue was due to a difference in how we handled default values of arrays and sequences compared with other default values. See 8253770, which applies similar logic to array/sequence defaults as we do with other defaults, e.g.

estr = string.encode().decode('unicode_escape')
estr = estr.replace('"', r'\"')

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@audrow audrow changed the base branch from master to rolling June 28, 2022 14:23
@jacobperron jacobperron removed their assignment May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Colcon build fails with UnicodeDecodeError for WstringArrays in Foxy, but successful in dashing
2 participants