-
-
Notifications
You must be signed in to change notification settings - Fork 732
Distributed hello world fails when using jemalloc #1190
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
@jreback no, they don't seem to be related here. Sadly I cannot reproduce this problem on my machines here (neither OSX nor Debian 7+). It seems that there is some relation to the old Ubuntu 12.04 image @bluenote10 is running on which ships with jemalloc 2.2.x and glibc 2.13. |
I cannot reproduce on Ubuntu 16.04 either. In any case, this doesn't seem to be a Distributed bug, so I'm inclined to close this issue unless you have significant reason to believe Distributed is involved. |
I'm not sure. Similar to the other issue it seems to affect the networking. Maybe you can make more sense of the full traceback:
|
The fact that networking may be affected doesn't imply that Distributed is the culprit. Distributed is pure Python code and does not depend in any particular way on the underlying C I might add: why are you using jemalloc? Did you get specific performance improvements using it? |
Result of valgrind leading to the traceback
|
I had less issues resulting from memory fragmentation with jemalloc, but I should be able to use glibc as well. |
@pitrou The arrow issues were solely a build environment issue that surfaced while loading a library. Once the library is successfully loaded, this glibc bug is not triggered anymore. We're using jemalloc in general (note: I work at the same place as @bluenote10 ;) ) as it has less memory fragmentation as glibc and has a better multithreaded performance. In Apache Arrow (different use case then in this issue here), we also use it as it can provide aligned memory (re-)allocation, this enables us to use faster numeric CPU instructions. |
Ok. Still, I don't know what to do with this issue. Using a different memory allocator shouldn't mess network communications implemented in pure Python, unless there's something seriously wrong in low-level routines (I mean routines implemented in C either inside Python itself or inside system libraries and/or third-party C libraries). As for the Valgrind output, the following seems fishy. But a Valgrind or glibc expert would have to dig in:
|
Sure we can close it, if there is nothing that can be done. Just wanted to keep you posted on the issue. |
Observation made in relation to #1179.
Running a local distributed executor like in the "hello world" example crashes if the allocator is jemalloc.
Starting an IPython shell via
LD_PRELOAD=/usr/lib/libjemalloc.so.1 ipython
and running the following codecrashes with
ValueError: Worker not started
The code works fine using glibc.
The text was updated successfully, but these errors were encountered: