Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

C++ client: heap buffer overflow in Subscription::poll if conductor thread calls Subscription::removeImage concurrently #383

Closed
lukeocamden opened this issue Jul 28, 2017 · 8 comments
Assignees
Labels

Comments

@lukeocamden
Copy link

lukeocamden commented Jul 28, 2017

Hello, consider the following interleaving:

Subscription::poll

const int length = std::atomic_load(&m_imagesLength); // reads eg 5

Subscription::removeImage

std::atomic_store(&m_imagesLength, length - 1); // writes 4
std::atomic_store(&m_images, newArray); // writes pointer to array of length 4

Subscription::poll

Image *images = std::atomic_load(&m_images); // reads pointer to array of length 4
...
fragmentsRead += images[i].poll(fragmentHandler, fragmentLimit - fragmentsRead); // with length == 4, this is undefined behavior.

Does that make sense?

This would be simpler to reason about, and perhaps faster if there were only one write (say a pointer to a vector) required for the conductor to communicate the updated set of images. So conductor would do:

// <updates m_conductorImageList>
delete m_latestImagesList.exchange(new std::vector<Image>(m_conductorImageList));

and subscriber does

if (m_latestImagesList.load(std::memory_order_relaxed) // rarely true
{
    std::unique_ptr<std::vector<Images>> latest(m_latestImagesList.exchange(nullptr));
    m_subscriberImagesList = *latest;
}
<uses m_subscriberImagesList>

Of course, any change that fixes the bug is good for me!

Thanks!

@tmontgomery
Copy link
Contributor

The case of the removeImage while a poll takes an interrupt is possible.

Testing with vector did not show nearly the needed performance on the iteration side over a raw array. So, a raw array does much better in this case.

A couple potential fixes here, though.
1 - encapsulate the array into a dynamic array type. This was done originally and dropped due to some optimizations being missed. It is well past time to revisit.
1 - Re-read the array length after grabbing the array to make sure it is the same length and has not changed. This might prevent some migration and hinder optimizations.
1 - Add in a change number scheme such as used with the driver conductor/receiver handoff for SMs and NAKs. This might also deal well with lingering the array and buffers.

All three need to be tried and looked at.

@tmontgomery tmontgomery self-assigned this Jul 31, 2017
@goglusid
Copy link

goglusid commented Jul 31, 2017

@tmontgomery What about the following? You still get the performance of using an array but you need only one atomic load so no interleaving.

struct ImageArray
{
size_t imageCount;
Image* images[1];
};

You could also allocate size and array as a contiguous memory block to avoid having an array of 1 image when count is zero. It is the same idea anyway.

ImageArray* images = std::atomic_load(&m_images);

@mjpt777
Copy link
Contributor

mjpt777 commented Jul 31, 2017

@tmontgomery Re-reading the array length could suffer from the ABA problem. The better solution is to have a change number applied to two counters for the before and after change complete. We use this effectively in the PublicationImage for NAK signalling which is your third suggestion.

@goglusid How do you ensure the access lifetimes to the pointed struct is appropriate?

@goglusid
Copy link

goglusid commented Jul 31, 2017

@mjpt777 Well in the current design, you use a timeout to deal with the lifetime of these image arrays and logbuffers. So my proposed change still works with lingering...even if it is risky to use timeouts. Fair enough though a high enough timeout would be practically safer.

That being said, in my super dooper version of Aeron. ;) I implemented the following:

In the ClientConductor:

  • For every resource being lingered, I increment long long m_lingerPosition. And I take a snapshot of this position along with the actual resource ptr.
  • Every RESOURCE_TIMEOUT_MS I update an atomic m_conductorLingerPosition like this:

m_conductorLingeringPosition.store(m_lingeringPosition, std::memory_order_release);

  • Once in a while the application calls a new function on the Aeron class, let's say Aeron::consumeLingerResources().

inline void ClientConductor::consumeLingerResources()
{
const long long clientLingeringPosition(m_clientLingeringPosition.load(std::memory_order_acquire));
const long long conductorLingeringPosition(m_conductorLingeringPosition.load(std::memory_order_acquire));

  if (clientLingeringPosition < conductorLingeringPosition)
  {
    m_clientLingeringPosition.store(conductorLingeringPosition, std::memory_order_release);
  }
}
  • The ClientConductor is safe to delete lingering resources when the m_clientLingeringPosition >= m_conductorLingeringPosition

Note that this approach only works if a all subscribers and the ClientConductor::consumeLingerResources are called by a single thread.

I'll take a look to the PublicationImage's NAK signalling.

@tmontgomery
Copy link
Contributor

@mjpt777 yes, re-reading the length suffers from ABA. I realized it late last night while thinking on it. Encapsulating the length and array into a dynamic array (or @goglusid struct) is also appealing if it can allow the optimizations to work themselves out. The lifetime is easy since we already linger the array itself anyway. But that can be cleaned up as well.

@goglusid one of the items we want to do is to refcnt the logbuffers. So, that complicates the lingering slightly as then logbuffers can be polled from multiple threads.

@goglusid
Copy link

goglusid commented Aug 2, 2017

@tmontgomery In order to avoid allocating the ImageList and the actual Image array, how about the following?

class ImageList
{
public:

static ImageList* create(size_t length)
{
ImageList* imageList = new (new uint8_t[sizeof(ImageList) + (sizeof(Image)*length)]) ImageList(length);

for (size_t i = 0;i < length;i++)
{
  new (&(imageList->images()[i])) Image();
}

return imageList;

}

void deleteIt()
{
for (size_t i = 0;i < m_length;i++)
{
images()[i].~Image();
}

delete[] reinterpret_cast<uint8_t*>(this);

}

inline Image* images()
{
return reinterpret_cast<Image*>(reinterpret_cast<uint8_t*>(this) + sizeof(ImageList));
}

const size_t m_length;

private:
ImageList(size_t length) : m_length(length)
{
}

~ImageList() {}
};

@tmontgomery
Copy link
Contributor

@goglusid we could do this. Actually, I have a different option in mind that is similar. We intend to soon make more changes, though. One is the refcnt of logbuffers so that the footprint of the mappings can be lower. And another is a change number technique similar to what you proposed for lingering resources. So, this is a temporary change until after next release and we make more changes.

@mjpt777
Copy link
Contributor

mjpt777 commented Aug 10, 2019

Was addressed some time ago with the latest solution applied in this commit. 8031771

@mjpt777 mjpt777 closed this as completed Aug 10, 2019
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants