Skip to content

Rebuild musl with --enable-debug #26

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wants to merge 1 commit into from

Conversation

yakov-olkhovskiy
Copy link
Member

Required rebuild with ./configure --enable-debug to produce complete backtrace.

closes ClickHouse/ClickHouse#67921

@yakov-olkhovskiy
Copy link
Member Author

@Algunenano could you please review this?

@@ -54,7 +54,7 @@ docker export b38a367a8a05 > ppc64.tar

The ubuntu version 14.04 is selected for better compatibility.

- for `x86_64-musl` they are from https://musl.cc/. Last updated from build 23-Nov-2021 04:50 (7e2fa1cbc6b6c23d15e7b65015777e89 x86_64-linux-musl-native.tgz).
- for `x86_64-musl` they are from https://musl.cc/. Last updated from build 23-Nov-2021 04:50 (7e2fa1cbc6b6c23d15e7b65015777e89 x86_64-linux-musl-native.tgz). Binaries `crt1.o`, `crti.o`, `libc.a` are rebuilt from https://git.musl-libc.org/ (commit dd1e63c3) configured with `--enable-debug`.
Copy link
Member

@Algunenano Algunenano Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several issues:

  • Unless it's really necessary I'd avoid using top of the HEAD as a reference, specially taking into account that the commit is about a revert. It's better to stick to a stable release.
  • Please update headers and binaries together, otherwise they don't match.
  • I'm guessing that you haven't setup build flags, so you are likely building an unoptimized version of musl. Please include build flags. For example the ones used in the official binaries: https://conf.musl.cc/plain_20211122_11-2-1.txt

@al13n321
Copy link
Member

al13n321 commented Aug 6, 2024

I tried it, and it's not enough, clickhouse failed to unwind out of signal handler.

(Specifically, I built musl-1.2.5 with ./configure --enable-debug && make, copied the headers and libs to contrib/sysroot/linux-x86_64-musl/, built clickhouse with cmake .. -DUSE_MUSL=1 -DCMAKE_TOOLCHAIN_FILE=../cmake/linux/toolchain-x86_64-musl.cmake && ninja -j30, then ran clickhouse local, then did kill -s SIGSEGV on it, and it printed stack trace with just two frames ending with ./build/./src/Common/SignalHandlers.cpp:83: signalHandler.)

There are multiple problems, I didn't get to the bottom of it yet, will keep looking.

  • Musl has a strong opinion that self-unwinding is bad, so there's no option to enable .eh_frame: https://inbox.vuxu.org/musl/20210718111655.6ec25889@vostro/t/ - "backtrace() considered harmful." "Having application unwind itself for backtrace printing purposes especially in signal handler is bad." "Doing introspective debugging when a crash happens is inviting exploitation"
  • --enable-debug only produces .debug_frame, which our libunwind doesn't seem to support.
  • We can force .eh frame anyway:
    • Comment out this line in tools/add-cfi.x86_64.awk: print ".cfi_sections .debug_frame"
    • CFLAGS='-funwind-tables -fasynchronous-unwind-tables' ./configure --enable-debug
    • This produced .eh_frame and no .debug_frame, but unwinding still didn't work.
  • https://maskray.me/blog/2022-04-10-unwinding-through-signal-handler#musl-x86-64 says that signal trampoline flag is missing in musl's debug info. But I tried adding .cfi_signal_frame to src/signal/x86_64/restore.s, and it still didn't work.
  • I think part of the problem is that musl doesn't call our signal handler using call. It does something else and sets return pointer to the first instruction of __restore_rt(). Then libunwind subtracts 1 from that address (on the assumption that return address points to the next instruction after a call) and ends up on the nop outside __restore_rt, which is not covered by unwind tables.
  • But even if that is fixed, I think it won't work because there's no unwind information about extracting register values out of the context struct. That would need to be written manually, I guess. The post above says it works in gdb with -fno-omit-frame-pointer, but probably gdb has special code path to retrieve registers from signal context, and libunwind probably doesn't. EDIT: Actually no, I'm confused how gdb can work here, there's nowhere to get this information without unwind info; will look into that.
  • Unrelated and less important: musl also has incorrect unwind info for __clone(), so unwinding ends up with infinitely repeating __clone()s at the end of stack trace. That's mostly ok, we have a limit on stack size, but it can be fixed. Or worked around in libunwind by stopping if the next frame has the same ip and sp as the current one.

Oof.

@yakov-olkhovskiy
Copy link
Member Author

yakov-olkhovskiy commented Aug 7, 2024

If it's that incompatible with unwind I'm not sure we need it at all.
@alexey-milovidov do you think we should try to resolve these issues or just get rid of musl?

@alexey-milovidov
Copy link
Member

alexey-milovidov commented Aug 7, 2024

Let's try to patch Musl to support stack unwinding.
There are a lot of benefits if we are able to migrate to Musl eventually.

But if it's too hard, let's drop it.

PS. It is also makes sense to look at Cosmopolitan Libc, but last time I tried, it was very immature, and cannot even build a C++ application. Even the support for pthread came out just recently.

PS. There is also LLVM Libc, which is developed in Google at a dog-slow pace, and most likely not ready at all.

@al13n321
Copy link
Member

al13n321 commented Aug 8, 2024

Turns out GDB recognizes the exact machine code of this __restore_rt function (mov $15, %rax syscall) and has special unwind logic for it.

So our options are:
a. Patch musl to produce unwind information for __restore_rt similar to glibc. But I imagine the reply may be "gdb already works, libunwind already doesn't work, and that's exactly how we want it; libunwind not working is a feature".
b. Patch libunwind to recognize __restore_rt similar to gdb.
c. Add out own signal handler wrapper with its own unwind information that skips libc/musl trampoline.

Claude guesses 30% probability that musl upstream would accept such patch, and 60% probability for llvm-libunwind. Sounds about right. I'll make the libunwind change.

@alexey-milovidov
Copy link
Member

alexey-milovidov commented Aug 8, 2024

We will maintain our fork of Musl; there is no need to care about upstream; the same goes for libunwind. Actually, we want to fork Musl (to remove most of the unused functions, add instrumentation, etc).

Patching libunwind is also ok.

@al13n321
Copy link
Member

al13n321 commented Aug 9, 2024

ClickHouse/libunwind#32 and ClickHouse/libunwind#33

Musl needs to be built with:

  • print ".cfi_sections .debug_frame" commented out in tools/add-cfi.x86_64.awk (and optionally in tools/add-cfi.i386.awk)
  • -funwind-tables -fasynchronous-unwind-tables in CFLAGS (configure also unconditionally adds the opposite flags, so the compiler command lines end up with contradictory flags, but apperently CFLAGS take precedence, presumably because they're appended after the default flags)
  • --enable-debug (required even for unwinding, because otherwise configure doesn't run the awk scripts that add unwind info to asm files)

@al13n321
Copy link
Member

Did that in #28

@al13n321 al13n321 closed this Aug 14, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stacktraces are broken in Fasttest
4 participants