VLR String Description Inconsistency #50

BenCurran98 · 2025-02-13T06:32:34Z

Fix inconsistency in writestring due to unicode encoding

Description

This fixes a bug that crops up sometimes when reading VLRs and EVLRs from a LAS file whose descriptions contain unicode characters with 2 bytes of length instead of 1. Basically, when we write a VLR description, we call writestring(io, description, 32) (since as per the LAS spec, the description should be 32 bytes in length). However, writestring would add a buffer to the string (trailing "\0" 's) if length(description) is < 32, even if size(description) == 32, meaning we're adding extra bytes to the file causing some corruptions down the line.
This turns into a simple fix, just replacing length with size in writestring

Types of Changes

Bug fix (non-breaking change which fixes an issue)

Review

List of tasks the reviewer must do to review the PR

Tests
Documentation
CHANGELOG

Signed-off-by: BenCurran98 <b.curran@fugro.com>

n6171028 · 2025-02-13T06:36:37Z

src/util.jl

@@ -20,7 +20,7 @@ end
 Write a string `str` to an IO channel `io`, writing exactly `nb` bytes (padding if `str` is too short)
 """
 function writestring(io, str::AbstractString, nb::Integer)
-    n = length(str)
+    n = Base.sizeof(str)


This is very tricky...how you found it?

Printing a bunch of stuff out over several days 😆

MeganDawson42 · 2025-02-13T06:37:21Z

test/util.jl

+
+    # test with special characters whose ASCII encoding is > 0x80, meaning their number of "code units" will be 2, not one
+    # see here for more details: https://docs.julialang.org/en/v1/manual/strings/#Unicode-and-UTF-8
+    str = "aβcd"


missed opportunity to add emoticons?

Fix inconsistency in writestring due to unicode encoding

cda9bec

Signed-off-by: BenCurran98 <b.curran@fugro.com>

BenCurran98 requested review from n6171028, MeganDawson42, mazadbakht, zqiaofugro and i-kieu February 13, 2025 06:32

n6171028 reviewed Feb 13, 2025

View reviewed changes

n6171028 approved these changes Feb 13, 2025

View reviewed changes

MeganDawson42 approved these changes Feb 13, 2025

View reviewed changes

MeganDawson42 reviewed Feb 13, 2025

View reviewed changes

BenCurran98 merged commit 3f8100b into main Feb 13, 2025
10 checks passed

BenCurran98 deleted the VLRDescBug branch February 13, 2025 06:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLR String Description Inconsistency #50

VLR String Description Inconsistency #50

BenCurran98 commented Feb 13, 2025 •

edited

Loading

n6171028 Feb 13, 2025

BenCurran98 Feb 13, 2025

MeganDawson42 Feb 13, 2025

VLR String Description Inconsistency #50

VLR String Description Inconsistency #50

Conversation

BenCurran98 commented Feb 13, 2025 • edited Loading

Fix inconsistency in writestring due to unicode encoding

Description

Types of Changes

Review

n6171028 Feb 13, 2025

Choose a reason for hiding this comment

BenCurran98 Feb 13, 2025

Choose a reason for hiding this comment

MeganDawson42 Feb 13, 2025

Choose a reason for hiding this comment

BenCurran98 commented Feb 13, 2025 •

edited

Loading