-
-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Fix rare termination logic failures that could result in early shutdown #4556
Conversation
i'll try and set up a VM to reproduce locally at some point (probably not for a day or two) but not sure what the issue is since everything else is happy.. |
9c167df
to
b4f0960
Compare
i'm able to reproduce the crash.. some info:
based on the above, my current guess is there's some sort of interaction between the @ponylang/core any thoughts on how to proceed? |
Without any evidence to support my statements, this feels like a memory clobbering. And it also feels like a musl bug. |
@dipinhora can you use the 20241203 musl image i just pushed and see if there is any difference? it should have a new musl that fixed "multiple race conditions". Worth a try! NEVERMIND. LLVM isn't building. I'm working on getting us able to use that newer musl. |
@dipinhora I'll let you know when the new builder is ready to go and try with this. |
@dipinhora you can rebase against main and get the new builder using the latest musl release. |
Hi @dipinhora, The changelog - fixed label was added to this pull request; all PRs with a changelog label need to have release notes included as part of the PR. If you haven't added release notes already, please do. Release notes are added by creating a uniquely named file in the The basic format of the release notes (using markdown) should be:
Thanks. |
Prior to this commit, there was a very rare edge case in the termination logic that could result in early shutdown resulting in a segfault. This commit simplifies and reworks the shutdown/termination logic in order to make it more robust with less edge cases. The logic now: * does not un-noisy an actor from the ASIO thread until the relevant ASIO event is destroyed instead of when it is unsubscribed. This is important because the ASIO subsystem still has a reference to the actor and can send a message to it until the ASIO event is destroyed even if it has been unsubscribed * always runs the CNF/ACK protocol to all schedulers instead of only the active ones * disables scheduler scaling to ensure all schedulers are active for the duration of the termination CNF/ACK protocol to avoid / minimize complexity from schedulers suspending during the termination process * ensures the local scheduler tracking of ASIO noisiness is more accurate and robust to messages being received out of order
b4f0960
to
a71a1fa
Compare
@dipinhora am i correct that the new builder with new musl didn't address? |
no, but it gave more info in the backtrace.. it's a race condition on program startup between i'm working on a fix.. |
Awesome. |
8d32e15
to
31dfe32
Compare
8cea50f
to
d11b2fa
Compare
@SeanTAllen fix pushed.. release notes added.. i believe the |
Awesome work @dipinhora. Thanks. |
Prior to this commit, there was a very rare edge case in the termination logic that could result in early shutdown resulting in a segfault.
This commit simplifies and reworks the shutdown/termination logic in order to make it more robust with less edge cases.
The logic now: