You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
libtbb.so appears to be incompatible with dlmopen(). It's admittedly a niche use case, but a library or plug-in may want to link against tbb in isolation of a larger program.
There appear to be a few factors at play:
(1) libtbb.so attempts to dlopen() various other libraries (iomp5, tcm) as part of static initialization logic. At least on linux, it appears that libdl.so uses a long jump instruction for error handling, and an inner dlopen() will return control to an outer dlmopen() call on failure. Thus if the inner dlopen("libtcm.so") or dlopen("libiomp5.so") fails, instead of simply returning null as expected, control is transferred directly to the outer dlmopen(). Curiously, this behavior does not happen with nested dlopen() calls, so it may be a bug in the dlmopen() behavior, but I have not been able to locate any bug tracker references.https://sourceware.org/bugzilla/show_bug.cgi?id=31164
Below is the stack trace immediately before a call to dlopen("libtcm.so"). Doing a single step shows control going directly back to dlmopen() instead of unrolling the stack as expected.
Before:
#0 0x00007ffff6b00050 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#1 0x00007ffff6d16a78 in global_symbols_link (required=11, descriptors=0x7ffff6f36440 <tbb::detail::r1::(anonymous namespace)::tcm_link_table>, library=0x7ffff6d2c425 "libtcm.so.1") at oneTBB/src/tbb/dynamic_link.cpp:391
#2 _ZN3tbb6detail2r112dynamic_linkEPKcPKNS1_23dynamic_link_descriptorEmPPvi (library=library@entry=0x7ffff6d2c425 "libtcm.so.1", descriptors=descriptors@entry=0x7ffff6f36440 <tbb::detail::r1::(anonymous namespace)::tcm_link_table>,
required=required@entry=11, handle=handle@entry=0x0, flags=flags@entry=7) at oneTBB/src/tbb/dynamic_link.cpp:467
#3 0x00007ffff6d210d3 in tbb::detail::r1::tcm_adaptor::initialize () at oneTBB/src/tbb/tcm_adaptor.cpp:241
#4 tbb::detail::r1::__TBB_InitOnce::add_ref () at oneTBB/src/tbb/main.cpp:97
#5 0x00007ffff6d0fe57 in _GLOBAL__sub_I_main.cpp.lto_priv.98 () at oneTBB/src/tbb/main.h:73
#6 0x00007ffff6d0fe8e in global constructors keyed to 65535_0_address_waiter.cpp.o.30392 () from oneTBB/install/lib64/libtbb.so
#7 0x00007ffff7de7ef2 in call_init.part () from /lib64/ld-linux-x86-64.so.2
#8 0x00007ffff7de7fe6 in _dl_init () from /lib64/ld-linux-x86-64.so.2
#9 0x00007ffff7dec16d in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#10 0x00007ffff7078374 in _dl_catch_error () from /lib64/libc.so.6
#11 0x00007ffff7deb9a9 in _dl_open () from /lib64/ld-linux-x86-64.so.2
#12 0x00007ffff7bd6960 in dlmopen_doit () from /lib64/libdl.so.2
#13 0x00007ffff7078374 in _dl_catch_error () from /lib64/libc.so.6
#14 0x00007ffff7bd6675 in _dlerror_run () from /lib64/libdl.so.2
#15 0x00007ffff7bd6a36 in dlmopen () from /lib64/libdl.so.2
#16 0x000000000040086c in main (argc=1, argv=0x7fffffffe3b8) at wtf.cpp:7
After a single step:
#0 0x00007ffff7078361 in _dl_catch_error () from /lib64/libc.so.6
#1 0x00007ffff7deb9a9 in _dl_open () from /lib64/ld-linux-x86-64.so.2
#2 0x00007ffff7bd6960 in dlmopen_doit () from /lib64/libdl.so.2
#3 0x00007ffff7078374 in _dl_catch_error () from /lib64/libc.so.6
#4 0x00007ffff7bd6675 in _dlerror_run () from /lib64/libdl.so.2
#5 0x00007ffff7bd6a36 in dlmopen () from /lib64/libdl.so.2
#6 0x000000000040086c in main (argc=1, argv=0x7fffffffe3b8) at wtf.cpp:7
(2) In the event where libiomp5.so is present, it is opened with RTLD_GLOBAL
As at glibc 2.24, specifying the RTLD_GLOBAL flag when calling dlmopen() generates an error. Furthermore, specifying RTLD_GLOBAL when calling dlopen() results in a program crash (SIGSEGV) if the call is made from any object loaded in a namespace other than the initial namespace.
(3) (Into the weeds here) If you patch libdl to work around the above bugs so that libtbb.so successfully can be dlmopen()'d AND your initial program used pthread_key_create(), it ultimately segfaults. The issue appears to originate in the following code:
On the first call, the theTLS.get() is expected to return null and initialize a bunch of things. However, due to an erroneous interaction between posix thread local variables and dlmopen(), the call instead returns a garbage reference to a thread local variable in the outer linker namespace. As a result, TBB's arena is never initialized and you end up with a segfault here:
All of the factors below are ultimately due to deficiencies in other libraries. Nevertheless, there are a few things that tbb can do to sidestep the bugs:
Avoid dlopen() during static initialization logic, or at least provide an opt-out mechanism. This is arguably good hygiene for a library anyways; it was quite surprising that simply using a tbb::concurrent_unordered_set triggered multiple dlopen() calls and initialized a thread pool.
Use newer thread-local constructs (e.g. __thread) instead of the constructs in pthread if available.
Hi TBB team,
libtbb.so appears to be incompatible with
dlmopen()
. It's admittedly a niche use case, but a library or plug-in may want to link against tbb in isolation of a larger program.There appear to be a few factors at play:
(1) libtbb.so attempts to
dlopen()
various other libraries (iomp5, tcm) as part of static initialization logic. At least on linux, it appears thatlibdl.so
uses a long jump instruction for error handling, and an innerdlopen()
will return control to an outerdlmopen()
call on failure. Thus if the innerdlopen("libtcm.so")
ordlopen("libiomp5.so")
fails, instead of simply returningnull
as expected, control is transferred directly to the outerdlmopen()
. Curiously, this behavior does not happen with nesteddlopen()
calls, so it may be a bug in thedlmopen()
behavior,but I have not been able to locate any bug tracker references.https://sourceware.org/bugzilla/show_bug.cgi?id=31164Below is the stack trace immediately before a call to
dlopen("libtcm.so")
. Doing a single step shows control going directly back todlmopen()
instead of unrolling the stack as expected.Before:
After a single step:
(2) In the event where
libiomp5.so
is present, it is opened with RTLD_GLOBALoneTBB/src/tbb/misc_ex.cpp
Line 165 in 3b9f9ba
This triggers the SEGFAULT mentioned here:
https://manpages.debian.org/testing/manpages-dev/dlmopen.3.en.html#BUGS
(3) (Into the weeds here) If you patch
libdl
to work around the above bugs so thatlibtbb.so
successfully can bedlmopen()
'd AND your initial program usedpthread_key_create()
, it ultimately segfaults. The issue appears to originate in the following code:https://github.com/oneapi-src/oneTBB/blob/3b9f9baef9c27e4d22ebeff59118aaddaa40e9f2/src/tbb/governor.h#L100C8-L100C8
On the first call, the
theTLS.get()
is expected to return null and initialize a bunch of things. However, due to an erroneous interaction between posix thread local variables anddlmopen()
, the call instead returns a garbage reference to a thread local variable in the outer linker namespace. As a result, TBB's arena is never initialized and you end up with a segfault here:oneTBB/src/tbb/task_group_context.cpp
Line 182 in 3b9f9ba
This bug is documented here:
https://sourceware.org/bugzilla/show_bug.cgi?id=24776
All of the factors below are ultimately due to deficiencies in other libraries. Nevertheless, there are a few things that tbb can do to sidestep the bugs:
dlopen()
during static initialization logic, or at least provide an opt-out mechanism. This is arguably good hygiene for a library anyways; it was quite surprising that simply using atbb::concurrent_unordered_set
triggered multipledlopen()
calls and initialized a thread pool.__thread
) instead of the constructs in pthread if available.Repro:
The text was updated successfully, but these errors were encountered: