-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
C++ Client Crashes on ClientConductor::onInterServiceTimeout #371
Comments
If we were to delay the call to MemoryMappedFile::cleanUp() X ms after the actual ClientConductor::onInterServiceTimeout then we could avoid this crash. This period (X ms) would represent the maximum amount of execution time for a single call to Publication::offer or the time elapsed between calling Publication::tryClaim and BufferClaim.commit |
For reference, I'm talking about the following code when I talk about the management of Image's log buffers has lingering resources: void ClientConductor::onUnavailableImage(
} |
cc @mjpt777 Lingering doesn't solve the underlying issue. The same thing exists in the Java version, I do believe. Lingering simply moves the time horizon. At its heart this is a race between the munmap due to the inter service timeout and the BufferClaim commit/abort operations. |
The Java code does not call the unavailable handlers when a forced close happens. I've also just pushed a change that will linger the resources for 1ms on a normal close and 1s on an inter service timeout. |
I will reflect in C++ in the next couple days if not sooner. Also, I want to make the C++ API have the agent invoker type option soon. |
I agree that lingering only reduce the probability of having this issue. If we were to store a smart ptr in the Publication instance returned by findPublication then the application would control the lifetime of the logbuffers without possible race. Anything I am missing here? |
@goglusid Hmmm. Very very good point. That might work. Will give it a think. Yeah, that might be a nice way to handle it. Might also be usable for Java as well. Keep it around until |
@tmontgomery I meant keep it around until Publication::~Publication |
Agreed. Was thinking about Java as well. Which requires an explicit close of the Publication instead of it simply going out of scope. |
…ivePublication to keep mapping around while in scope. For #371. Updated naming and layout for subcriber position in available image.
@goglusid go ahead and see about this now. The Publication (and ExclusivePublication) have a shared_ptr to the LogBuffers. So, this should be cleaner now. |
@tmontgomery Your awesomeness knows no bounds! ;p Problem solved. Thanks :D |
Thanks! No worries! We'll be making some other changes in this area shortly as well. |
@tmontgomery Could you please elaborate a bit on the other changes in this area? |
Experimenting with reference counting the mappings for #365 so multiple mappings are not needed. Also want to add the agent invoker style thread control to C++. And also change the mapping flags. |
When the following stack of functions are executed, if the C++ client still has pointers on the log buffers then it crashes.
Following is how it can happen:
Thread#1: Call Publication::tryClaim
Thread#1: Use the BufferClaim...
Thread#2[ConductorThread]: Detects a timeout and execute the following stack.
Thread#1: Calls BufferClaim.commit();
Obviously, here I'm debugging so I reach the 5 seconds timeout.
That being said, to be thread safe it seems that there's a need to managed the MemoryMappedFiles has lingering resources like the subscription's images.
aeron::util::MemoryMappedFile::cleanUp() Line 206
aeron::util::MemoryMappedFile::~MemoryMappedFile() Line 219
std::_Ref_countaeron::util::MemoryMappedFile::_Destroy() Line 578 + 0x23 bytes
std::_Ref_count_base::_Decref() Line 538
std::vector<std::shared_ptraeron::util::MemoryMappedFile,std::allocator<std::shared_ptraeron::util::MemoryMappedFile > >::_Destroy(std::shared_ptraeron::util::MemoryMappedFile * _First=0x00549310, std::shared_ptraeron::util::MemoryMappedFile * _Last=0x00549318) Line 1885 + 0x40 bytes
std::vector<std::shared_ptraeron::util::MemoryMappedFile,std::allocator<std::shared_ptraeron::util::MemoryMappedFile > >::_Tidy() Line 1952
aeron::LogBuffers::~LogBuffers() Line 84 + 0x56 bytes
aeron::LogBuffers::`scalar deleting destructor'() + 0xf bytes
std::_Ref_count_objaeron::LogBuffers::_Destroy() Line 1327
std::_Ref_count_base::_Decref() Line 538 aeron::ClientConductor::PublicationStateDefn::~PublicationStateDefn() + 0x65 bytes
std::vector<aeron::ClientConductor::PublicationStateDefn,std::allocatoraeron::ClientConductor::PublicationStateDefn >::clear() Line 1616 + 0x64 bytes
aeron::ClientConductor::onInterServiceTimeout(__int64 now=1499112928196) Line 548
aeron::ClientConductor::onHeartbeatCheckTimeouts() Line 303
aeron::concurrent::AgentRunneraeron::ClientConductor,aeron::concurrent::SleepingIdleStrategy::run() Line 64 + 0x2e bytes
The text was updated successfully, but these errors were encountered: