-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
C++ client: heap buffer overflow in Subscription::poll if conductor thread calls Subscription::removeImage concurrently #383
Comments
The case of the removeImage while a poll takes an interrupt is possible. Testing with vector did not show nearly the needed performance on the iteration side over a raw array. So, a raw array does much better in this case. A couple potential fixes here, though. All three need to be tried and looked at. |
@tmontgomery What about the following? You still get the performance of using an array but you need only one atomic load so no interleaving. struct ImageArray You could also allocate size and array as a contiguous memory block to avoid having an array of 1 image when count is zero. It is the same idea anyway. ImageArray* images = std::atomic_load(&m_images); |
@tmontgomery Re-reading the array length could suffer from the ABA problem. The better solution is to have a change number applied to two counters for the before and after change complete. We use this effectively in the @goglusid How do you ensure the access lifetimes to the pointed struct is appropriate? |
@mjpt777 Well in the current design, you use a timeout to deal with the lifetime of these image arrays and logbuffers. So my proposed change still works with lingering...even if it is risky to use timeouts. Fair enough though a high enough timeout would be practically safer. That being said, in my super dooper version of Aeron. ;) I implemented the following: In the ClientConductor:
m_conductorLingeringPosition.store(m_lingeringPosition, std::memory_order_release);
inline void ClientConductor::consumeLingerResources()
Note that this approach only works if a all subscribers and the ClientConductor::consumeLingerResources are called by a single thread. I'll take a look to the PublicationImage's NAK signalling. |
@mjpt777 yes, re-reading the length suffers from ABA. I realized it late last night while thinking on it. Encapsulating the length and array into a dynamic array (or @goglusid struct) is also appealing if it can allow the optimizations to work themselves out. The lifetime is easy since we already linger the array itself anyway. But that can be cleaned up as well. @goglusid one of the items we want to do is to refcnt the logbuffers. So, that complicates the lingering slightly as then logbuffers can be polled from multiple threads. |
@tmontgomery In order to avoid allocating the ImageList and the actual Image array, how about the following? class ImageList static ImageList* create(size_t length)
} void deleteIt()
} inline Image* images() const size_t m_length; private: ~ImageList() {} |
@goglusid we could do this. Actually, I have a different option in mind that is similar. We intend to soon make more changes, though. One is the refcnt of logbuffers so that the footprint of the mappings can be lower. And another is a change number technique similar to what you proposed for lingering resources. So, this is a temporary change until after next release and we make more changes. |
Was addressed some time ago with the latest solution applied in this commit. 8031771 |
Hello, consider the following interleaving:
Subscription::poll
Subscription::removeImage
Subscription::poll
Does that make sense?
This would be simpler to reason about, and perhaps faster if there were only one write (say a pointer to a vector) required for the conductor to communicate the updated set of images. So conductor would do:
and subscriber does
Of course, any change that fixes the bug is good for me!
Thanks!
The text was updated successfully, but these errors were encountered: