-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Deadlocks on NetBSD around fork() #76600
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
We do consider UB in the standard library a bug, but we can only fix it if you point out where the issue is. "rewrite the |
There is one known async signal safety issue in the standard library, the use of EDIT: There is one more issue, Command is using execvp which isn't asnyc-signal-safe, unlike variants which don't need to know PATH. |
I can reproduce a deadlock on NetBSD; not including a reproducer, since it seems any trivial one will do. Backtrace generally shows The problem persisted after I patched calls to functions that aren't signal safe (removed the call to The problem persisted after I rewrote the test case in C: #include <stdio.h>
#include <assert.h>
#include <pthread.h>
#include <sys/types.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#define N 4
static pthread_t threads[N];
static void spawn() {
pid_t p;
p = fork();
assert(p != -1);
if (p == 0) {
_exit(0);
} else {
pid_t r;
int wstatus;
r = waitpid(p, &wstatus, 0);
assert(r != -1);
assert(r == p);
}
}
static void *run(void *arg) {
for (int i=0; i != 10; ++i) {
spawn();
}
return NULL;
}
int main() {
int r;
for (int i=0; i != N; ++i) {
r = pthread_create(threads +i, NULL, run, NULL);
assert(r == 0);
}
for (int i=0; i != N; ++i) {
r = pthread_join(threads[i], NULL);
assert(r == 0);
}
return 0;
} |
So, doesn't this mean that this is not a Rust bug but a NetBSD bug?
It would still be helpful to have one. Little details often matter surprisingly much. |
What is that based on? I checked the manpage and couldn't find information about signal safety of the various |
|
Backtrace of child process after deadlock:
Potentially related NetBSD bug report https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=49816 The |
This matches pretty well with one of the backtraces I've seen with
the hangs I had observed earlier have so far not re-surfaced with this particular program. So... Both this and the C-based reproducer earlier appears to indicate that this may be a problem in NetBSD and not in rust. However, I think the C-based reproducer above has a different root cause. On my 9.0/amd64 test system, it didn't result in a hang with N=4, but did with N=400, resulting in one zombie, and a number of threads either in "wait" or in "parked" states. |
I updated NetBSD from 9.0 to latest daily snapshot. The previous problems didn't reproduce so far, but I encountered another problem and reduced it to a program in C. Attaching / detaching debugger continues the execution, so that seems to be some form of missed notification. C source#include <stdio.h>
#include <assert.h>
#include <pthread.h>
#include <sys/types.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#define N 4
static pthread_t threads[N];
static void spawn() {
pid_t p;
p = fork();
assert(p != -1);
if (p == 0) {
_exit(0);
} else {
pid_t r;
int wstatus;
r = waitpid(p, &wstatus, 0);
assert(r != -1);
assert(r == p);
}
}
static void *run(void *arg) {
int r;
pthread_attr_t attr;
pthread_t current = pthread_self();
r = pthread_getattr_np(current, &attr);
assert(r == 0);
r = pthread_attr_destroy(&attr);
assert(r == 0);
for (int i=0; i != 10; ++i) {
spawn();
}
return NULL;
}
int main() {
int r;
for (int i=0; i != N; ++i) {
r = pthread_create(threads +i, NULL, run, NULL);
assert(r == 0);
}
for (int i=0; i != N; ++i) {
r = pthread_join(threads[i], NULL);
assert(r == 0);
}
return 0;
} Backtrace
EDIT: This one can be reproduced without fork. EDIT: Similar problems discussed on current-users mailing list: |
Should this be closed, then? |
To the best of my knowledge, |
@tmiasko I had someone run your C program on NetBSD-current (not just NetBSD-9 stable), and there apparently the problem with stuck threads / processes was not reproducible, so there is hope that this bug won't be there when NetBSD 10 is released. |
I can reproduce the last issue on daily snapshot from 2020-09-18, usually after just a few seconds of executing program in the loop. If you do reproduce it at some point, I would appreciate if you could report this upstream, since I generally don't use NetBSD. Simplified reproducer, since it turns out neither fork nor pthread_getattr_np is required, except for the fact that latter is allocating memory: #include <assert.h>
#include <pthread.h>
#include <stdlib.h>
#define N 4
static pthread_t threads[N];
static void *run(void *arg) {
return malloc(1024);
}
int main() {
for (int i = 0; i != N; ++i) assert(pthread_create(&threads[i], NULL, run, NULL) == 0);
for (int i = 0; i != N; ++i) assert(pthread_join(threads[i], NULL) == 0);
} |
I'm going to close this, please report it upstream as a NetBSD issue. It's neither related to rust nor |
@tmiasko I reported http://gnats.netbsd.org/55670 -- this is apparently caused by concurrency bugs in our "jemalloc" implementation in the netbsd-9 code base. Reportedly this is fixed in NetBSD-current, and will be in NetBSD 10.0 when that comes out. |
I do realize this is likely to be ignored, but I just cannot let go of bringing it up anyway...
It appears to me that rust in certain circumstances is relying on undefined behavior, in that it is in general a multi-threaded program, and will in many cases do fork(), and perform many non-trivial tasks between fork() and exec(). Some of these things have manifested themselves in run-time problems observed on NetBSD: (detected) deadlocks in ld.elf_so (resulting in abort), deadlocks related to malloc() manifesting as hangs etc. In NetBSD, the fork() man page contains this passage:
A Linux/Debian man page for fork(2) also contains a similar passage:
Even though I don't point to offending code here, I have it on good authority that this is the root cause of the issues we have been observing on NetBSD. Is there any chance that the rust code could have a make-over so as to not rely on undefined behavior in this aspect?
The text was updated successfully, but these errors were encountered: