-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add a testcase for pthreads race conditions #12258
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
printf("%d %d\n", i, total); | ||
for (int j = 0; j < 1024; j++) { | ||
// allocation uses a mutex | ||
auto* rd = new random_device(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be doing directly with the pthread_mutex APIs rather than indirectly depending on the implementation of "/dev/urandom"?
Of you could write the test directly against pthread.h we could also see if it occurs in musl's native configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the use of /dev/random
is not just for a mutex - it's also for proxying (all file I/O is proxied to the main thread). That involves more than just a mutex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, I was hoping for something a little more precise .... there is so much going on here its hard to know what this is testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, definitely... yeah, this is not a great testcase. But it's the smallest I've managed so far that shows the issue, which is really hard to reproduce (as shown by it existing since forever, apparently).
If I have time I can try to reduce this more. But it may be better to focus on figuring out the actual cause of the problem, as that may suggest a testcase. We don't need to merge this urgently and may never merge it I guess.
[Commenting just to bump any notifications on this higher in my inbox] |
I have found the actual cause here, and will open a refactoring PR and then a fix PR shortly. The fix PR will contain a variant of this test, turned off by default. |
This is a manual test for race conditions (disabled by default as it is
very long, and checks for a race condition so it is inherently flakey)
that are fixed in #12243 #12244 #12245
It passes after the fixes in those PRs. Without them, it tends to fail
after 100 iterations out of 1000, so at least for me locally it fails
pretty consistently before the fixes.
Note that that was with chrome. I saw the test fail on firefox too
but far more rarely. On node I never saw it fail. So it definitely is
sensitive to timing somehow.