Skip to content

Ensure rcutils_char_array_t is null-termiated after memcpy #493

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 1 commit into
base: rolling
Choose a base branch
from

Conversation

cottsay
Copy link
Member

@cottsay cottsay commented Mar 31, 2025

When copying a substring from another string, we need to explicitly verify that the new string will contain a null terminator, which may necessitate allocation of an additional byte.

@cottsay cottsay added the bug Something isn't working label Mar 31, 2025
@cottsay cottsay self-assigned this Mar 31, 2025
When copying a substring from another string, we need to explicitly
verify that the new string will contain a null terminator, which may
necessitate allocation of an additional byte.

Signed-off-by: Scott K Logan <logans@cottsay.net>
@cottsay cottsay force-pushed the cottsay/char-array-memcpy branch from 630aadd to 87321fa Compare March 31, 2025 19:15
@cottsay
Copy link
Member Author

cottsay commented Mar 31, 2025

Preliminary CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

Full CI:

  • (pending green preliminary CI)

@wjwwood
Copy link
Member

wjwwood commented Mar 31, 2025

I think that the reason this function does not handle anything with null terminator characters is because the function it was the analogue of, memcpy, also does not handle null terminator:

The function does not check for any terminating null character in source - it always copies exactly num bytes.
-- https://cplusplus.com/reference/cstring/memcpy/

I don't remember the details, but there are good reasons for this. If we want to have a function that does concern itself with null terminators, then I think it would be better to have a differently named function for it.

@wjwwood
Copy link
Member

wjwwood commented Mar 31, 2025

Or maybe we're already messing with the idea of memcpy by growing the destination as needed...

@cottsay
Copy link
Member Author

cottsay commented Mar 31, 2025

I can follow that logic and I think it would be fine to close this PR with that in mind.

That said, do you think that it is part of the rcutils_char_array_t contract that the buffer is always null-terminated? Assuming of course that the caller isn't directly modifying the data.

rcutils_ret_t ret = rcutils_char_array_expand_as_needed(char_array, n);
size_t new_length = n;
if (n > 0 && '\0' != src[n - 1]) {
new_length += 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the fact that the new length will depend on the content being copied, and it also prevents you from intentionally creating a character array without a null terminating character (maybe you're tracking the length separately and build a string out of intermediate string fragments or something idk). I suppose it's fine because you can use the rcutils_uint8_array_t for that, but it still feels weird.

Copy link
Member Author

@cottsay cottsay Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd question whether non-terminated strings is something this type is intended to support at all. The _strcat and _strncat functions expect the character at index buffer_length - 1 to be a terminator and overwrite it with the beginning of the source string. Also, the _resize function writes a null terminator to the buffer when the string is shortened. Oh, actually it only does that if it doesn't own the buffer?

The argument against changing the allocated size based on content is valid. I'd say we should balance that against the risk of an inconsistent state we live with today.

if (ret != RCUTILS_RET_OK) {
// rcutils_char_array_expand_as needed already set the error
return ret;
}
memcpy(char_array->buffer, src, n);
char_array->buffer_length = n;
char_array->buffer[new_length - 1] = '\0'; // always have an ending
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is my main complaint with this change, consider this additional test code:

  {
    rcutils_ret_t ret = rcutils_char_array_init(&char_array, 8, &allocator);
    ASSERT_EQ(RCUTILS_RET_OK, ret);
    
    EXPECT_EQ(RCUTILS_RET_OK, rcutils_char_array_strcpy(&char_array, "hello world"));
    EXPECT_STREQ("hello world", char_array.buffer);
    EXPECT_EQ(12lu, char_array.buffer_length);

    EXPECT_EQ(RCUTILS_RET_OK, rcutils_char_array_memcpy(&char_array, "HELLO", strlen("HELLO")));
    EXPECT_STREQ("HELLO world", char_array.buffer);
    EXPECT_EQ(12lu, char_array.buffer_length);
  }

Personally I would expect the result (with memcpy) to be "HELLO world", but it's not:

[ RUN      ] ArrayCharTest.memcpy
/workspaces/rcutils/test/test_char_array.cpp:162: Failure
Expected equality of these values:
  "HELLO world"
  char_array.buffer
    Which is: "HELLO"

/workspaces/rcutils/test/test_char_array.cpp:163: Failure
Expected equality of these values:
  12lu
    Which is: 12
  char_array.buffer_length
    Which is: 6

[  FAILED  ] ArrayCharTest.memcpy (0 ms)

Maybe this could be addressed with only setting the null-terminator when the array was expanded?

Copy link
Member Author

@cottsay cottsay Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this PR, the buffer would be in the state you're expecting it to be, but the buffer_length would be set to 5. The way I see it, the function is disregarding the "extra" space available in the buffer at that point, and only considers the string's "length" to be 5 characters long. Also, if I subsequently performed a rcutils_char_array_strcat() it would pick up writing to the string at index 5, not after the null terminator.

So unless the memory we're copying in has a single null character right at the end, we'll be in an inconsistent state.

@cottsay
Copy link
Member Author

cottsay commented Mar 31, 2025

Maybe this is just a documentation problem. The other functions seem to keep the buffer termination and length value in a consistent state, but this function seems a little more dangerous and depending on the source content, things can get wonky.

Would some sort of note in the docblock to that effect be better?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants