Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

libtbb.so cannot be used with dlmopen() #1283

Open
aws-taylor opened this issue Dec 13, 2023 · 3 comments
Open

libtbb.so cannot be used with dlmopen() #1283

aws-taylor opened this issue Dec 13, 2023 · 3 comments

Comments

@aws-taylor
Copy link

aws-taylor commented Dec 13, 2023

Hi TBB team,

libtbb.so appears to be incompatible with dlmopen(). It's admittedly a niche use case, but a library or plug-in may want to link against tbb in isolation of a larger program.

There appear to be a few factors at play:
(1) libtbb.so attempts to dlopen() various other libraries (iomp5, tcm) as part of static initialization logic. At least on linux, it appears that libdl.so uses a long jump instruction for error handling, and an inner dlopen() will return control to an outer dlmopen() call on failure. Thus if the inner dlopen("libtcm.so") or dlopen("libiomp5.so") fails, instead of simply returning null as expected, control is transferred directly to the outer dlmopen(). Curiously, this behavior does not happen with nested dlopen() calls, so it may be a bug in the dlmopen() behavior, but I have not been able to locate any bug tracker references. https://sourceware.org/bugzilla/show_bug.cgi?id=31164

Below is the stack trace immediately before a call to dlopen("libtcm.so"). Doing a single step shows control going directly back to dlmopen() instead of unrolling the stack as expected.

Before:

#0  0x00007ffff6b00050 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#1  0x00007ffff6d16a78 in global_symbols_link (required=11, descriptors=0x7ffff6f36440 <tbb::detail::r1::(anonymous namespace)::tcm_link_table>, library=0x7ffff6d2c425 "libtcm.so.1") at oneTBB/src/tbb/dynamic_link.cpp:391
#2  _ZN3tbb6detail2r112dynamic_linkEPKcPKNS1_23dynamic_link_descriptorEmPPvi (library=library@entry=0x7ffff6d2c425 "libtcm.so.1", descriptors=descriptors@entry=0x7ffff6f36440 <tbb::detail::r1::(anonymous namespace)::tcm_link_table>,
   required=required@entry=11, handle=handle@entry=0x0, flags=flags@entry=7) at oneTBB/src/tbb/dynamic_link.cpp:467
#3  0x00007ffff6d210d3 in tbb::detail::r1::tcm_adaptor::initialize () at oneTBB/src/tbb/tcm_adaptor.cpp:241
#4  tbb::detail::r1::__TBB_InitOnce::add_ref () at oneTBB/src/tbb/main.cpp:97
#5  0x00007ffff6d0fe57 in _GLOBAL__sub_I_main.cpp.lto_priv.98 () at oneTBB/src/tbb/main.h:73
#6  0x00007ffff6d0fe8e in global constructors keyed to 65535_0_address_waiter.cpp.o.30392 () from oneTBB/install/lib64/libtbb.so
#7  0x00007ffff7de7ef2 in call_init.part () from /lib64/ld-linux-x86-64.so.2
#8  0x00007ffff7de7fe6 in _dl_init () from /lib64/ld-linux-x86-64.so.2
#9  0x00007ffff7dec16d in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#10 0x00007ffff7078374 in _dl_catch_error () from /lib64/libc.so.6
#11 0x00007ffff7deb9a9 in _dl_open () from /lib64/ld-linux-x86-64.so.2
#12 0x00007ffff7bd6960 in dlmopen_doit () from /lib64/libdl.so.2
#13 0x00007ffff7078374 in _dl_catch_error () from /lib64/libc.so.6
#14 0x00007ffff7bd6675 in _dlerror_run () from /lib64/libdl.so.2
#15 0x00007ffff7bd6a36 in dlmopen () from /lib64/libdl.so.2
#16 0x000000000040086c in main (argc=1, argv=0x7fffffffe3b8) at wtf.cpp:7

After a single step:

#0  0x00007ffff7078361 in _dl_catch_error () from /lib64/libc.so.6
#1  0x00007ffff7deb9a9 in _dl_open () from /lib64/ld-linux-x86-64.so.2
#2  0x00007ffff7bd6960 in dlmopen_doit () from /lib64/libdl.so.2
#3  0x00007ffff7078374 in _dl_catch_error () from /lib64/libc.so.6
#4  0x00007ffff7bd6675 in _dlerror_run () from /lib64/libdl.so.2
#5  0x00007ffff7bd6a36 in dlmopen () from /lib64/libdl.so.2
#6  0x000000000040086c in main (argc=1, argv=0x7fffffffe3b8) at wtf.cpp:7

(2) In the event where libiomp5.so is present, it is opened with RTLD_GLOBAL

if ( dynamic_link( "libiomp5.so", iompLinkTable, 1, &libhandle, DYNAMIC_LINK_GLOBAL ) ) {

This triggers the SEGFAULT mentioned here:

https://manpages.debian.org/testing/manpages-dev/dlmopen.3.en.html#BUGS

As at glibc 2.24, specifying the RTLD_GLOBAL flag when calling dlmopen() generates an error. Furthermore, specifying RTLD_GLOBAL when calling dlopen() results in a program crash (SIGSEGV) if the call is made from any object loaded in a namespace other than the initial namespace.

(3) (Into the weeds here) If you patch libdl to work around the above bugs so that libtbb.so successfully can be dlmopen()'d AND your initial program used pthread_key_create(), it ultimately segfaults. The issue appears to originate in the following code:

thread_data* td = theTLS.get();
if (td) {
    return td;
}
init_external_thread();

https://github.com/oneapi-src/oneTBB/blob/3b9f9baef9c27e4d22ebeff59118aaddaa40e9f2/src/tbb/governor.h#L100C8-L100C8

On the first call, the theTLS.get() is expected to return null and initialize a bunch of things. However, due to an erroneous interaction between posix thread local variables and dlmopen(), the call instead returns a garbage reference to a thread local variable in the outer linker namespace. As a result, TBB's arena is never initialized and you end up with a segfault here:

if (td->my_task_dispatcher->m_execute_data_ext.context == td->my_arena->my_default_ctx || !ctx.my_traits.bound) {

This bug is documented here:

https://sourceware.org/bugzilla/show_bug.cgi?id=24776


All of the factors below are ultimately due to deficiencies in other libraries. Nevertheless, there are a few things that tbb can do to sidestep the bugs:

  • Avoid dlopen() during static initialization logic, or at least provide an opt-out mechanism. This is arguably good hygiene for a library anyways; it was quite surprising that simply using a tbb::concurrent_unordered_set triggered multiple dlopen() calls and initialized a thread pool.
  • Use newer thread-local constructs (e.g. __thread) instead of the constructs in pthread if available.

Repro:

#include <iostream>
#include <dlfcn.h>
#include <sysexits.h>

int main(int argc, char * argv[]){

  void * handle = dlmopen(LM_ID_NEWLM, "libtbb.so", RTLD_LAZY);

  if(!handle){
    std::cerr << dlerror() << std::endl;
    return EX_SOFTWARE;
  }

  return EX_OK;
}
@dnmokhov
Copy link
Contributor

Thank you for submitting this and providing the reproducer and the details.

@arunparkugan
Copy link

@aws-taylor is this issue still relevant?

@aws-taylor
Copy link
Author

Hi @arunparkugan,

Low priority, but yes, I believe this issue still impacts tbb.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

3 participants