Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Escape default arrays and sequences the same other default values #620

Open
wants to merge 1 commit into
base: rolling
Choose a base branch
from

Conversation

jacobperron
Copy link
Member

Fixes #610

@jacobperron
Copy link
Member Author

As a regression test, I've added a new field to one of the test messages: ros2/test_interface_files#16

@jacobperron
Copy link
Member Author

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@jacobperron
Copy link
Member Author

Rpr fails because ament/ament_cmake#352 has not been released.

@sloretz
Copy link
Contributor

sloretz commented Sep 30, 2021

This is interesting. I can't predict if it will pass the test_communication tests. I recommend adding another test to test_interface_files with "ハローワールド" to see what it does when given a string with characters outside the latin-1 character set.

https://github.com/ros2/test_interface_files/blob/ea4d4f33eca97f37b4294e6ab012fa0f216de609/msg/WStrings.msg#L2-L4

IIUC Hellö wörld!" in that .msg file is encoded as the UTF-8 bytes b'Hell\xc3\xb6 w\xc3\xb6rld!'. When read as latin-1 the string in Python would be very different from the original.

>>> b'Hell\xc3\xb6 w\xc3\xb6rld!'.decode('latin-1')
'Hellö wörld!'

When re-encoded as latin-1 that should output the original bytes, but there's some more weirdness when the generators read the idl files.

return codecs.decode(match.group(0), 'unicode-escape')

@jacobperron
Copy link
Member Author

IIUC Hellö wörld!" in that .msg file is encoded as the UTF-8 bytes b'Hell\xc3\xb6 w\xc3\xb6rld!'. When read as latin-1 the string in Python would be very different from the original.

In fact there were test failures related to this that I missed locally: https://ci.ros2.org/job/ci_linux/15332/testReport/junit/rosidl_generator_py.test/test_interfaces/test_wstrings/

@jacobperron jacobperron self-assigned this Oct 5, 2021
Fix #610

Apply the same encode/decode pattern and escaping as for other default values.

Signed-off-by: Jacob Perron <jacob@openrobotics.org>
@jacobperron jacobperron force-pushed the jacob/rolling_fix_610 branch from 548af14 to 8253770 Compare October 14, 2021 20:02
@jacobperron jacobperron changed the title Use latin-1 encoding for reading interface file content Escape default arrays and sequences the same other default values Oct 14, 2021
@jacobperron
Copy link
Member Author

I think the issue was due to a difference in how we handled default values of arrays and sequences compared with other default values. See 8253770, which applies similar logic to array/sequence defaults as we do with other defaults, e.g.

estr = string.encode().decode('unicode_escape')
estr = estr.replace('"', r'\"')

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@audrow audrow changed the base branch from master to rolling June 28, 2022 14:23
@jacobperron jacobperron removed their assignment May 9, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Colcon build fails with UnicodeDecodeError for WstringArrays in Foxy, but successful in dashing
2 participants