Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

VLR String Description Inconsistency #50

Merged
merged 1 commit into from
Feb 13, 2025
Merged

VLR String Description Inconsistency #50

merged 1 commit into from
Feb 13, 2025

Conversation

BenCurran98
Copy link
Collaborator

@BenCurran98 BenCurran98 commented Feb 13, 2025

Fix inconsistency in writestring due to unicode encoding

Description

This fixes a bug that crops up sometimes when reading VLRs and EVLRs from a LAS file whose descriptions contain unicode characters with 2 bytes of length instead of 1. Basically, when we write a VLR description, we call writestring(io, description, 32) (since as per the LAS spec, the description should be 32 bytes in length). However, writestring would add a buffer to the string (trailing "\0" 's) if length(description) is < 32, even if size(description) == 32, meaning we're adding extra bytes to the file causing some corruptions down the line.
This turns into a simple fix, just replacing length with size in writestring

Types of Changes

  • Bug fix (non-breaking change which fixes an issue)

Review

List of tasks the reviewer must do to review the PR

  • Tests
  • Documentation
  • CHANGELOG

Signed-off-by: BenCurran98 <b.curran@fugro.com>
@@ -20,7 +20,7 @@ end
Write a string `str` to an IO channel `io`, writing exactly `nb` bytes (padding if `str` is too short)
"""
function writestring(io, str::AbstractString, nb::Integer)
n = length(str)
n = Base.sizeof(str)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very tricky...how you found it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Printing a bunch of stuff out over several days 😆


# test with special characters whose ASCII encoding is > 0x80, meaning their number of "code units" will be 2, not one
# see here for more details: https://docs.julialang.org/en/v1/manual/strings/#Unicode-and-UTF-8
str = "aβcd"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed opportunity to add emoticons?

@BenCurran98 BenCurran98 merged commit 3f8100b into main Feb 13, 2025
10 checks passed
@BenCurran98 BenCurran98 deleted the VLRDescBug branch February 13, 2025 06:41
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants