Skip to content

gperftools' stacktrace capturing methods and their issues

Aliaksiej Kandracienka (aka Aliaksei Kandratsenka) edited this page Sep 23, 2024 · 6 revisions

We capture backtraces in tricky scenarios, such as in the CPU profiler case, where we capture backtraces from a signal handler. Therefore, we cannot rely on glibc's backtrace() function. In practice today, no completely robust backtracing solution works for all use cases (but we're getting closer, thankfully). So, we offer a range of stack trace capturing methods. This page describes our options and our current (Aug 2023) experience with them.

Use TCMALLOC_STACKTRACE_METHOD environment variable to select backtracing implementation at runtime. We also offer the TCMALLOC_STACKTRACE_METHOD_VERBOSE environment variable, which makes gperftools print a complete set of available options and which option is active.

The simplest way to see the list of available backtracing options on your system is by running "TCMALLOC_STACKTRACE_METHOD_VERBOSE=t ./stacktrace_unittest."

Frame pointers

TLDR: Use TCMALLOC_STACKTRACE_METHOD=generic_fp when all code is compiled with frame pointers, and you'll get nearly always correct and robust backtraces.

The simplest of all options is to rely on frame pointers. But this requires that all relevant code is compiled with frame pointers enabled. Most architectures default to no frame pointers configuration because maintaining frame pointers imposes a slight but significant performance penalty. So, to use frame pointers backtracing, you need to compile your code with options to enable them explicitly.

We now offer a single unified implementation that covers multiple architectures (but not yet PowerPC; we need people who care about this architecture to contribute). Completely tested and supported architectures are Linux/x86 (all variants), Linux/aarch64, and Linux/riscv. It is worth noting that legacy, 32-bit arm-s don't currently have functional frame-pointers backtracing on GCC.

This implementation is selected by default if the --enable-frame-pointers flag is given to the configure script. You can also select the frame pointers method by setting the environment variable TCMALLOC_STACKTRACE_METHOD=generic_fp or TCMALLOC_STACKTRACE_METHOD=generic_fp_unsafe. The former variant performs extra pointer checks to avoid rare crashes when dealing with codes that use the frame pointer register for other means (i.e. when code is compiled with "-fomit-frame-pointer"). The later (unsafe) variant doesn't do those checks but runs substantially faster. The safe variant is the default.

Frame pointers backtracing method is very fast (especially unsafe variant). It can never crash or deadlock. But it will occasionally produce incorrect backtraces, particularly for the CPU profiler. Two cases of incorrect backtraces are worth describing.

In the first case, the CPU profiler occasionally interrupts the program in the middle of the function prologue or epilogue. In that case, we get the instruction pointer of one function and the frame pointer of its caller. That case will produce a backtrace that has the second frame completely missing. This effect can be confusing and annoying on cpu profiles. It usually affects only a small fraction of CPU profile samples, but it is worth keeping this in mind.

In the second case, the CPU profiler occasionally interrupts the program in functions without frame pointers. It often happens for standard but simple assembly routines, such as memcpy, memset, strcmp, etc. Those routines usually don't touch the frame pointer register, so the effect is the same as case one above (missing parent of leaf frame). Another variant of this case happens when the CPU profiler interrupts the program in functions that use the frame pointer register for some other non-frame pointer needs. That occurs in some functions compiled with "omit frame pointer" option. Bogus frame pointer value in the best case leads to the truncated backtraces; in the worst case, it may crash the program (if the unsafe backtracing method is selected).

Starting from version 2.11, we compile gperftools with "-fno-omit-frame-pointer -momit-leaf-frame-pointer" (unless overridden by "--enable-frame-pointers" configure switch; and when supported by the compiler). This configuration generates machine code compatible with frame pointers backtracing for all purposes other than CPU profiling without imposing performance penalties to critical malloc & free fast-path code. I.e., leaf functions, at least in our malloc implementation, are where forcing frame pointers would be most taxing, and yet backtracing from those leaf functions is ever needed by the CPU profiler.

Libunwind

TLDR: This is our default. Also available via TCMALLOC_STACKTRACE_METHOD=libunwind environment variable. But it has occasionally upset people with crashes and deadlocks.

All modern architectures have ABI that defaults to not having frame pointers. Instead, we're supposed to use various external "unwind info" metadata. It is usually using a facility originally introduced for exceptions. On ELF systems, this facility is typically utilizing the .eh_frame section. The data format is similar but not identical to DWARF unwind info (introduced to allow debuggers to show you backtraces) and is documented in gABI specs.

No extra compiler switches are necessary for most modern architectures. We need GCC switches "-funwind-tables" (or "-fexceptions") and "-fasynchronous-unwind-tables" (for CPU profiler; go read GCC docs for details). As noted above, usually, those are enabled by default.

Libunwind is one prominent library that allows us to capture backtraces using this metadata.

Our libunwind backtracing method has the following pros and cons. Cons:

  • Relatively slow, on the order of 1k cycles per frame
  • Occasionally, compilers have bugs in unwind info, which causes libunwind crashes. We also see occasional fixes in libunwind itself, so some of those crashes may be due to libunwind bugs. We recommend using the latest libunwind version for best results.
  • As of this writing, no configuration is entirely robust w.r.t. CPU profiling signal handler.

Pros:

  • When it works, the outcome is excellent without needing to recompile anything with special flags and without any downsides.

Generally, libunwind promises async-signal-safety for capturing backtraces. We need this for the CPU profiler case. But in practice, it is only partially async-signal-safe due to reliance on dl_iterate_phdr API, which is used to enumerate all loaded ELF modules (.so files and main executable binary). No libc offers dl_iterate_pdhr that is async-signal-safe. In practice, the issue may happen if we take CPU profiling signal during an existing dl_iterate_phdr call (such as when the program throws an exception) or during dlopen/dlclose-ing some .so module. There is а workaround for that kind of crash at https://github.com/alk/unwind_safeness_helper/tree/master. The alternative is fully static linking.

Crashes-wise, most are believed to be compiler bugs, but in case you faced one, consider filing ticket(s) to libunwind folk and your compiler vendor. Ideally, libunwind should at least have an option to be robust even when dealing with broken unwind info.

We used to have another issue on ARMs, where libunwind calls into malloc when backtracing. But the "--enable-emergency-malloc" configure flag is enabled now and should cover this case.

libgcc's _Unwind_Backtrace

TLDR: Use TCMALLOC_STACKTRACE_METHOD=libgcc and enable it by default via the "--enable-libgcc-unwinder-by-default" configure flag when running on the most recent Linux system.

Another library that can produce backtraces from unwind info is "libgcc_s.so". But since its primary purpose is exceptions, it hasn't always been fully robust especially for capturing backtraces from signal handlers.

However, most recent versions of this library (starting from gcc 12), running on very recent Linux distros (glibc version 2.35 and later), have been robust in our testing so far. This is thanks to glibc's dl_find_object API that solves the problem of async-signal-safe access to a set of loaded ELF modules. We recommend enabling it by default, but only on systems that use dl_find_object API. With that said, please note that we're not yet aware of this facility's "crashiness" experience when it faces incorrect unwind info.

As of September 2024 one such crashiness experience is known. For some reason glibc's i386 memcpy implementation appears to provide bogus/wrong unwind info or something similar. Backtracing from there triggers an attempt to access parent frames EBP register and if/when parent is built with -fno-frame-pointers, we get crash.

Clone this wiki locally