-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Fix a race condition in pthread_mutex_timedlock.c #12245
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
I'm not sure turning all pthread_mutex_lock calls into busy loops in an acceptable solution. There could be threads that wait for long periods on locks, no? |
Is this bug that effect musl in general or just emscripten? |
Also does the the same rice condition apply to timelock calls that do have timeout? What if I call timedlock with a 30 minute timeout? I could hit the race condition and end up waiting 30 minutes, right? |
I wonder if PR #10524 could also help resolve these race conditions? Since it ran the entire Open POSIX Test Suite in WebAssembly. |
I also wonder if this a real bug in musl if it might have been fixed upstream already? |
@kleisauke Good idea, but it doesn't look like #10524 can help here - it fixes other issues like thread cancellation, but not core mutex operations or our proxying logic. @sbc100 I did look at upstream musl, and the code has not changed significantly, so it's not fixed upstream AFAICT. I don't know if this only affects us or musl in general (it would be incredibly hard to test a native build of musl in a reliable enough way on a variant of #12258!). Overall I think the diagnoses in these three PRs is incorrect as has been pointed out. However, they are all necessary to fix #12258, and they each definitely fix a specific deadlock I encountered while debugging that testcase. So I guess we need to debug those three deadlocks more. I am a little unsure how best to do that though - how can I debug whether The only good news here is that this is likely not urgent, as these corner cases are very hard to hit. They are also all quite old, so I don't think we have any recent regression here. |
I have found the actual cause here, and will open a refactoring PR and then a fix PR shortly. |
pthread_mutex_lock
gives no time when it calls this function. In that casewe don't loop, we just do a single wait forever. In a rare race condition,
the condition we care about may be set right before the wait, and the wait
does not know to look for it (it looks for a wake event, it doesn't read the
memory to check the value).
Instead, just busy-wait. As this is for
pthread_mutex_lock
, the normal usecase is probably something that needs to be fast anyhow.