Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Savedata is not endianness portable #2693

Open
nurupo opened this issue Feb 17, 2024 · 3 comments
Open

Savedata is not endianness portable #2693

nurupo opened this issue Feb 17, 2024 · 3 comments
Labels
bug Bug fix for the user, not a fix to a build script P1 High priority
Milestone

Comments

@nurupo
Copy link
Member

nurupo commented Feb 17, 2024

Savedata created on a little-endian systems will not load on big-endian systems and the other way around.

I have looked into save_compatibility_test failing on s390x and found a couple of issues that I would like to document:

  1. Toxcore uses little-endian to host and host to little-endian conversion functions. For example, lendian_bytes_to_host32() in tox_load():

    c-toxcore/toxcore/tox.c

    Lines 704 to 709 in 710eb67

    memcpy(data32, data, sizeof(uint32_t));
    lendian_bytes_to_host32(data32 + 1, data + sizeof(uint32_t));
    if (data32[0] != 0 || data32[1] != STATE_COOKIE_GLOBAL) {
    return -1;
    }

    They function such that if WORDS_BIGENDIAN is defined, they assume the host is big-endian, otherwise it's little endian:

    c-toxcore/toxcore/state.c

    Lines 127 to 136 in 710eb67

    void lendian_bytes_to_host32(uint32_t *dest, const uint8_t *lendian)
    {
    uint32_t d;
    memcpy(&d, lendian, sizeof(uint32_t));
    #ifdef WORDS_BIGENDIAN
    d = ((d << 8) & 0xFF00FF00) | ((d >> 8) & 0xFF00FF);
    d = (d << 16) | (d >> 16);
    #endif /* WORDS_BIGENDIAN */
    *dest = d;
    }

    However, WORDS_BIGENDIAN is never defined by anything on a big-endian system, so those functions always assume the host is little-endian and just memcpy() the data as is, without any conversion. This results in little-endian systems storing and reading integers in the little-endian order, and big-endian systems storing and reading integers in the big-endian order.

    Trying to load a savedata created on amd64 on a s390x system will result in the tox_load() code linked above returning -1 on line 708 as the savedata file's magic number won't match due to the wrong endianness, with tox_new() returning TOX_ERR_NEW_LOAD_BAD_FORMAT to the user, as evident by save_compatibility_test failing on s390s with:

    ./auto_tests/save_compatibility_test.c:87: failed `tox': failed to create tox, error number: 9
    

    size_t index = 0;
    Tox_Err_New err;
    Tox *tox = tox_new_log(&options, &err, &index);
    ck_assert_msg(tox, "failed to create tox, error number: %d", err);

  2. If we fix the previous issue by defining WORDS_BIGENDIAN on a big-endian system, e.g. by adding the following to CMakeLists.txt:

    include(TestBigEndian)
    test_big_endian(IS_BIG_ENDIAN)
    if(IS_BIG_ENDIAN)
      add_definitions(-DWORDS_BIGENDIAN)
    endif()

    then the magic number matches and the code proceeds further in parsing the savedata, however the s390x save_compatibility_test then fails with:

    [#0] ERROR state.c:41   state_load:     state file garbled: 01ce != 01ce
    

    uint32_t cookie_type;
    lendian_bytes_to_host32(&cookie_type, data + sizeof(uint32_t));
    data += size_head;
    length -= size_head;
    if (length < length_sub) {
    /* file truncated */
    LOGGER_ERROR(log, "state file too short: %u < %u", length, length_sub);
    return -1;
    }
    if (lendian_to_host16(cookie_type >> 16) != cookie_inner) {
    /* something is not matching up in a bad way, give up */
    LOGGER_ERROR(log, "state file garbled: %04x != %04x", cookie_type >> 16, cookie_inner);
    return -1;
    }
    const uint16_t type = lendian_to_host16(cookie_type & 0xFFFF);

    The issue here is that the code does the same endianness conversion twice. On line 28 it converts little-endian to host 32-bit. This will convert the little-endian to the big-endian representation. But then on the line 39 it performs a little-endian to host conversion again, mistakenly converting those 16 bits from the host endianness (which is big-endian! this is why calling lendian_to_host16() on it produces non-portable result, as it's not lendian to begin with) to the little endian. Then the comparison fails because left side is little-endian now and the right side is big-endian. (Then there are the lower 16-bits being converted on like 45, which also is unnecessary and will produce an incorrect result).

    The saving code does the conversion twice too:

    host_to_lendian_bytes32(data, (host_to_lendian16(cookie_type) << 16) | host_to_lendian16(section_type));

    which are no-ops on little endian but produce unexpected result on big-endian.

Fixing this would break the savedata format on big-endian systems, changing it in non backwards-compatible way. There have been talk of switching savedata to using msgpack, which would also be a breaking change, so it might make sense to keep the current broken behavior for now and do all the savedata breaking changes together.

Here is a small snippet to reproduce the issue
$ git checkout 710eb674a50f17201bb7556d389d82d2133d98de
$ sudo docker run -it --rm -v "$PWD":/toxcore debian:bookworm /bin/bash
# dpkg --add-architecture s390x
# apt-get update
# apt-get install -y --no-install-recommends binutils-s390x-linux-gnu gcc-s390x-linux-gnu g++-s390x-linux-gnu qemu-user-static
# apt-get install -y libsodium-dev:s390x libopus-dev:s390x libvpx-dev:s390x pkg-config:s390x
# apt-get install -y cmake make
# echo "
    SET(CMAKE_SYSTEM_NAME Linux)

    SET(CMAKE_C_COMPILER   /usr/bin/s390x-linux-gnu-gcc)
    SET(CMAKE_CXX_COMPILER /usr/bin/s390x-linux-gnu-g++)
    SET(CMAKE_AR           /usr/bin/s390x-linux-gnu-ar)

    SET(CMAKE_FIND_ROOT_PATH_MODE_PROGRAM NEVER)
    SET(CMAKE_FIND_ROOT_PATH_MODE_LIBRARY ONLY)
    SET(CMAKE_FIND_ROOT_PATH_MODE_INCLUDE ONLY)

    SET(CROSSCOMPILING_EMULATOR /usr/bin/qemu-s390x-static)
" > /usr/local/share/s390x-linux-gnu.cmake
# export PKG_CONFIG_LIBDIR=/usr/lib/s390x-linux-gnu/pkgconfig:/usr/share/pkgconfig
# cp -a /toxcore/ ~/toxcore
# cd ~/toxcore/
# rm -rf _build/
# cmake -B_build -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=/usr/local/share/s390x-linux-gnu.cmake -DENABLE_SHARED=ON -DENABLE_STATIC=OFF -DAUTOTEST=ON -DBOOTSTRAP_DAEMON=OFF
# cmake --build _build --parallel "$(nproc)" --target auto_save_compatibility_test
# cd _build/
# ./auto_tests/auto_save_compatibility_test 
@nurupo nurupo added bug Bug fix for the user, not a fix to a build script P1 High priority labels Feb 17, 2024
@nurupo nurupo added this to the v0.3.0 milestone Feb 17, 2024
@iphydf
Copy link
Member

iphydf commented Feb 17, 2024

Why 0.3.0? Can this not be fixed earlier?

@iphydf
Copy link
Member

iphydf commented Feb 17, 2024

I see, your comment says big endian systems will break. True, but does anyone actually use toxcore on that? Is it worth delaying the fix by months/years?

nurupo added a commit to nurupo/InsertProjectNameHere that referenced this issue Feb 17, 2024
save_compatibility_test was failing on big-endian systems, as it was
written and tested on a little-endian system and savedata is not
endianness portable[1].

[1] TokTok#2693
@nurupo nurupo modified the milestones: v0.3.0, v0.2.x Feb 17, 2024
@nurupo
Copy link
Member Author

nurupo commented Feb 17, 2024

@iphydf well, that was before we had the discussion that it could be done without possibly breaking things and with Tox_Options flag(s).

There you go, updated the milestone. Not sure if v0.2.x or v0.2.20 is more appropriate, feel free to change it if I got it wrong.

nurupo added a commit to nurupo/InsertProjectNameHere that referenced this issue Feb 17, 2024
save_compatibility_test was failing on big-endian systems, as it was
written and tested on a little-endian system and savedata is not
endianness portable[1].

[1] TokTok#2693
nurupo added a commit to nurupo/InsertProjectNameHere that referenced this issue Feb 17, 2024
save_compatibility_test was failing on big-endian systems, as it was
written and tested on a little-endian system and savedata is not
endianness portable[1].

[1] TokTok#2693
nurupo added a commit to nurupo/InsertProjectNameHere that referenced this issue Feb 18, 2024
save_compatibility_test was failing on big-endian systems, as it was
written and tested on a little-endian system and savedata is not
endianness portable[1].

[1] TokTok#2693
@iphydf iphydf modified the milestones: v0.2.x, v0.2.20 Feb 21, 2024
@iphydf iphydf modified the milestones: v0.2.20, v0.2.21 Nov 6, 2024
@iphydf iphydf modified the milestones: v0.2.21, v0.2.x Nov 27, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Bug fix for the user, not a fix to a build script P1 High priority
Development

No branches or pull requests

2 participants