Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[C++] LogBufferDescriptor::indexByPosition triggers integer overflow #579

Closed
denizevrenci opened this issue Nov 7, 2018 · 3 comments
Closed

Comments

@denizevrenci
Copy link
Contributor

denizevrenci commented Nov 7, 2018

When I reassign a publication to the same channel by

publication = aeron.findPublication(regId)

Thereby, the previous publication object managed by the shared_ptr is destroyed and a new object is started to be managed by the shared_ptr. The subscription side receives the new image event before the unavailable image event.

During this time, if I poll the subscription, the soon-to-be-unavailable image is still accessible by the subscription. However, when poll traverses up the stack, it seems to try to access destructed memory. Below is the relevant part of the stack trace.

Thread 1 "test" received signal SIGSEGV, Segmentation fault.
0x000000000040cf30 in aeron::concurrent::atomic::getInt32Volatile (source=0x100000001)

#0  0x000000000040cf30 in aeron::concurrent::atomic::getInt32Volatile (source=0x100000001)
    at ../thirdparty/aeron/source/aeron-client/src/main/cpp/concurrent/atomic/Atomic64_gcc_x86_64.h:67

#1  0x000000000040cf19 in aeron::concurrent::AtomicBuffer::getInt32Volatile (this=0x7ffff0001458, offset=0)
    at ../thirdparty/aeron/source/aeron-client/src/main/cpp/concurrent/AtomicBuffer.h:242

#2  0x000000000040740c in aeron::concurrent::logbuffer::FrameDescriptor::frameLengthVolatile (logBuffer=..., frameOffset=0)
    at ../thirdparty/aeron/source/aeron-client/src/main/cpp/concurrent/logbuffer/FrameDescriptor.h:149

#3  0x000000000040fa03 in aeron::concurrent::logbuffer::TermReader::read<...>(...) (outcome=..., termBuffer=..., termOffset=0, handler=..., fragmentsLimit=2147483647, header=..., exceptionHandler=...) at ../thirdparty/aeron/source/aeron-client/src/main/cpp/concurrent/logbuffer/TermReader.h:85

#4  0x000000000040f969 in aeron::Image::poll<...>(...) (this=0x7ffff0001478, fragmentHandler=..., fragmentLimit=2147483647) at ../thirdparty/aeron/source/aeron-client/src/main/cpp/Image.h:398

#5  0x000000000040f813 in aeron::Subscription::poll<...>(...) (this=0x7ffff0000c60, fragmentHandler=..., fragmentLimit=2147483647) at ../thirdparty/aeron/source/aeron-client/src/main/cpp/Subscription.h:177

Could the conductor thread be cleaning up the term buffer while the main thread is making a call to Subscription::poll?

@denizevrenci
Copy link
Contributor Author

Code to reproduce:

#include <cassert>
#include <thread>
#include <type_traits>

#include <Aeron.h>

constexpr auto message_size = 1024u;
constexpr auto payload_size = message_size - 32u;
constexpr auto msg_count_until_max_pos = 100u;

constexpr int64_t get_max_position(const unsigned int term_buf_len_pow) {
    constexpr auto max_pos_mult_pow = 31;
    return 1L << (term_buf_len_pow + max_pos_mult_pow);
}

auto offer(aeron::ExclusivePublication& pub, const unsigned int value) {
    uint8_t buf[payload_size];
    aeron::concurrent::AtomicBuffer transport_buf{buf, payload_size};
    transport_buf.overlayStruct<std::remove_const_t<decltype(value)>>(0) = value;
    auto result = pub.offer(transport_buf);
    while (result == aeron::BACK_PRESSURED)
        result = pub.offer(transport_buf);
    return result;
}

auto claim(aeron::ExclusivePublication& pub, const unsigned int value) {
    aeron::concurrent::logbuffer::BufferClaim claim;
    auto result = pub.tryClaim(payload_size, claim);
    while (result == aeron::BACK_PRESSURED)
        result = pub.tryClaim(payload_size, claim);
    if (result) {
        claim.buffer().overlayStruct<std::remove_const_t<decltype(value)>>(claim.offset()) = value;
        claim.commit();
    }
    return result;
}

template <typename publish_method_t>
void hit_max_position(
        aeron::ExclusivePublication& pub,
        aeron::Subscription& sub,
        const int64_t initial_position,
        const int64_t max_position,
        publish_method_t&& publish) noexcept {
    auto send_value = 0u;
    auto expected_rcv_value = 0u;
    auto expected_position = initial_position;
    while (send_value != msg_count_until_max_pos) {
        auto result = publish(pub, send_value++);
        expected_position += message_size;
        assert(result == expected_position);
        sub.poll([&expected_rcv_value] (
                const aeron::concurrent::AtomicBuffer& buffer,
                aeron::util::index_t offset,
                aeron::util::index_t,
                const aeron::Header&) noexcept {
            assert(buffer.template overlayStruct<decltype(expected_rcv_value)>(offset) == expected_rcv_value++);
        }, 1);
    }
    assert(expected_position == max_position);
    auto result = publish(pub, send_value);
    assert(result == aeron::MAX_POSITION_EXCEEDED);
    while (expected_rcv_value != send_value) {
        sub.poll([&expected_rcv_value] (
                const aeron::concurrent::AtomicBuffer& buffer,
                aeron::util::index_t offset,
                aeron::util::index_t,
                const aeron::Header&) noexcept {
            assert(buffer.template overlayStruct<decltype(expected_rcv_value)>(offset) == expected_rcv_value++);
        }, 1);
    }
};

template <typename publish_method_t>
void check_mitigation(
        aeron::ExclusivePublication& pub,
        aeron::Subscription& sub,
        publish_method_t&& publish) noexcept {
    auto send_value = 0u;
    auto expected_rcv_value = 0u;
    int64_t expected_position = 0;
    while (send_value != msg_count_until_max_pos) {
        auto result = publish(pub, send_value++);
        expected_position += message_size;
        assert(result == expected_position);
        sub.poll([&expected_rcv_value] (
                const aeron::concurrent::AtomicBuffer& buffer,
                aeron::util::index_t offset,
                aeron::util::index_t,
                const aeron::Header&) noexcept {
            assert(buffer.template overlayStruct<decltype(expected_rcv_value)>(offset) == expected_rcv_value++);
        }, 1);
    }
    while (expected_rcv_value != send_value) {
        sub.poll([&expected_rcv_value] (
                const aeron::concurrent::AtomicBuffer& buffer,
                aeron::util::index_t offset,
                aeron::util::index_t,
                const aeron::Header&) noexcept {
            assert(buffer.template overlayStruct<decltype(expected_rcv_value)>(offset) == expected_rcv_value++);
        }, 1);
    }
}

int main() {
    constexpr auto max_position = get_max_position(26);
    constexpr int64_t offset = message_size * msg_count_until_max_pos;
    constexpr auto initial_position = max_position - offset;

    aeron::Context c;
    c.aeronDir("/dev/shm/transport_test_max_position_exceeded");
    aeron::Aeron client(c);

    auto pub_reg_id = client.addExclusivePublication("aeron:ipc?init-term-id=0|term-length=67108864|term-offset=67006464|term-id=2147483647", 0);
    auto sub_reg_id = client.addSubscription(
        "aeron:ipc",
        0,
        [] (aeron::Image& i) {std::cout << "up " << i.sessionId() << std::endl; },
        [] (aeron::Image& i) {std::cout << "down " << i.sessionId() << std::endl; });
    auto pub = client.findExclusivePublication(pub_reg_id);
    while (!pub) {
        std::this_thread::yield();
        pub = client.findExclusivePublication(pub_reg_id);
    }
    while (!pub->isConnected())
        std::this_thread::yield();
    auto sub = client.findSubscription(sub_reg_id);
    while (!sub) {
        std::this_thread::yield();
        sub = client.findSubscription(sub_reg_id);
    }
    hit_max_position(*pub, *sub, initial_position, max_position, offer);

    pub_reg_id = client.addExclusivePublication("aeron:ipc", 0);
    pub = client.findExclusivePublication(pub_reg_id);
    while (!pub) {
        std::this_thread::yield();
        pub = client.findExclusivePublication(pub_reg_id);
    }
    while (!pub->isConnected())
        std::this_thread::yield();
    check_mitigation(*pub, *sub, offer);
}

@denizevrenci
Copy link
Contributor Author

For convenience, I added the code above as a system test.

https://github.com/denizevrenci/aeron/tree/max_pos_exceeded_test

@denizevrenci
Copy link
Contributor Author

Ok, found the issue. It isn't a race condition so I will change the issue title.
Problem is caused by LogBufferDescriptor::indexByPosition() as it casts the result of position >> positionBitsToShift to int. By coincidence, this value is equal to INT_MAX + 1 for the default term buffer size of IPC publications which causes wrap around and we get a negative index from this function. It can simply be fixed by casting after the modulo operation.

@denizevrenci denizevrenci changed the title [C++] Possible race condition during cleanup of image term buffers [C++] LogBufferDescriptor::indexByPosition triggers integer overflow Nov 14, 2018
@mjpt777 mjpt777 closed this as completed Nov 16, 2018
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants