-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Deadlock triggered in test_can_send_from_pipe #62
Comments
Adding a |
This happens on a couple tests, and it was fixed in fc703be for the tests. I'm going to leave this issue open, though, because the fix applied was a band-aid. |
I wonder if you're running into something similar that I tracked down in PR #55 |
Could be. I'll make a simpler reproducer and run it in GDB so I can get stack traces for all the threads. I used py-spy to dump the Python traceback of the main thread, but it didn't show what any of the nng threads were doing. |
This doesn't appear to be related to #55. I was able to reproduce this, and it's deadlocking in I'm reproducing using: import pynng
import logging
logging.basicConfig(level=logging.DEBUG)
addr = "tcp://localhost:9898"
count = 0
while True:
with pynng.Pair1(listen=addr) as s0, pynng.Pair1(dial=addr) as s1:
print("send")
s0.send(b'hello')
print("recv")
s1.recv()
count += 1
print(count)
This very well could be an nng internal deadlock, but would need to debug a little deeper to be sure. |
For the last bug of @wtfuzz I think this is because the previously opened socket is not always closed after each cycle of the while loop, and you can't listen on the same IP port twice. So it is blocking on
|
@JoelStienlet I think adding the explicit gc is just adding some delay and not triggering the race condition that leads to deadlock. The deadlock is in a close, so it's not actually looping around and trying to listen again. Attempting to listen on an already bound endpoint would cause bind to fail and propagate up to throw an exception. |
Possibly related nanomsg/nng#1219 |
@wtfuzz you're right!
|
I love having other people comment on these issues! Thanks. I think that linked issue looks really promising @wtfuzz. It wouldn't be the first time that the tests in pynng trigger an edge condition in nng, either. |
The order of destruction between s0 and s1 may be important too: I observe a huge difference depending on the position of the forced garbage collecting in the following code (just uncomment one of the two gc.collect(), then run again with the other uncommented):
When the dialler is destroyed before the listener I've not yet seen a crash yet (but I may lack statistics here) |
I was able to reproduce this issue in plain ol' C: #include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <nng/nng.h>
#include <nng/protocol/pair0/pair.h>
#define CHECK(x) do { \
int ret = (x); \
if (ret != 0) { \
printf("error %d: %s\n", ret, nng_strerror(ret));\
abort(); \
} \
} while (0)
char addr[] = "tcp://localhost:9899";
int main() {
char buf[50];
for (int i = 0; ; i++) {
size_t got = sizeof buf;
nng_socket s0, s1;
CHECK(nng_pair_open(&s0));
CHECK(nng_listen(s0, addr, NULL, 0));
CHECK(nng_pair_open(&s1));
CHECK(nng_dial(s1, addr, NULL, 0));
CHECK(nng_send(s0, "hello", 5, 0));
CHECK(nng_recv(s1, buf, &got, 0));
printf("%d\n", i);
nng_close(s1);
nng_close(s0);
}
return 0;
} This is the C equivalent of what the Python version was doing. (Well, the C is a little simpler; in Python we call nng_recv with the flag NNG_ALLOC...) I'll open an issue on upstream nng. |
Nice! |
nng fixed this upstream (thanks Garrett!) but this bug will be present in pynng until the |
nng fixed this upstream and it's in v0.6.0 now. Hurray! |
There are intermittent deadlocks(?) when the test test_can_send_from_pipe runs. This causes testing to be a pain, and also prevents CI from being useful at all.
I suspect (but am really not sure) that this is related to a race condition between pipe callbacks being called and a socket closing. An unlikely but possible outcome is that libnng has a deadlock. Most likely it's on our end though.
The text was updated successfully, but these errors were encountered: