-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Greatly decrease the size of rustc_driver.so
when debuginfo is enabled
#110221
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
r? @ozkanonur (rustbot has picked a reviewer for you, use r? to override) |
|
Done: https://rust-lang.zulipchat.com/#narrow/stream/317568-t-compiler.2Fwg-debugging/topic/investigating.20debuginfo.20size/near/348854698 (Separately, I want to enable frame pointers unconditionally so that |
I think frame pointers aren't enough to get inlined functions which can be pretty important. But I could be wrong about that. |
@jyn514 I wasn't suggesting it for all functions, just for a couple trivial |
6537a5f
to
92375ea
Compare
I see that this is using -gz to compress debuginfo -- that implies gzip, right? Maybe we can use zstd or similar for even better wins, though perhaps tooling support is less mature there. I'm also having trouble running this locally. I seem to get this error with debuginfo enabled:
Can you confirm whether I'm just testing the wrong way or something else is wrong? Here is the command that's getting executed:
|
Yes, that implies gzip.
So,
I can't find a detailed description like this in the clang documentation, but I think it behaves the same: https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-gz I would rather not try to be smarter than the C compiler driver - I think it will be hard to maintain, the compression benefit won't be very high (https://rust-lang.zulipchat.com/#narrow/stream/122651-general/topic/precedent.20for.20linker.20feature.20detection.3F/near/348646500), and libbacktrace doesn't support it (and adding that support will itself increase the size of the generated binary).
Oh that's my bad, sorry, I forgot rust-lang/cargo#11958 didn't make it into 1.70 beta. This is blocked for another 6 weeks in that case. |
Ah, if that is done for us, great. I guess maybe it'll still break with e.g. windows msvc toolchain, but we can deal with that then. Sounds good on waiting another cycle here. |
@bors r+ I think this is a good step to take -- I don't think it's enough by itself that we can enable debuginfo by default, but still, a great win for everyone building rustc locally. |
📌 Commit c5c439188764c4dadc82ae451025535359d8aea6 has been approved by It is now in the queue for this repository. |
This comment has been minimized.
This comment has been minimized.
i think this is a bug in gcc actually? its detection is wrong for mingw :( i can either switch bootstrap from using |
- Only add -gz if it's supported - Don't include extra unnecessary debuginfo when only debuginfo-level=1 is set - Compress debuginfo sections to reduce the size of debuginfo on disk. before: 650 MB line tables only: 335 MB compressed only: 216 MB compressed and line tables: 186 MB no debuginfo at all: 130 MB I want to investigate why `-C line-tables-only` is still ~tripling the size of the binary, but this seems like a good improvement in the meantime. I've tested that both valgrind and perf can read the debuginfo: ``` (bash@dev-desktop-us-1.infra.rust-lang.org) ~/rust [08:31:08] ; valgrind $(rustup which rustc --toolchain rust_stage2) --version ==441671== Memcheck, a memory error detector ==441671== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==441671== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info ==441671== Command: /home/gh-jyn514/.local/lib/rustup/toolchains/rust_stage2/bin/rustc --version ==441671== rustc 1.70.0-dev ==441671== ==441671== HEAP SUMMARY: ==441671== in use at exit: 231,289 bytes in 1,874 blocks ==441671== total heap usage: 2,538 allocs, 664 frees, 486,368 bytes allocated ==441671== ==441671== LEAK SUMMARY: ==441671== definitely lost: 70,656 bytes in 1 blocks ==441671== indirectly lost: 0 bytes in 0 blocks ==441671== possibly lost: 0 bytes in 0 blocks ==441671== still reachable: 160,633 bytes in 1,873 blocks ==441671== suppressed: 0 bytes in 0 blocks ==441671== Rerun with --leak-check=full to see details of leaked memory ==441671== ==441671== For lists of detected and suppressed errors, rerun with: -s ==441671== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ; perf record $(rustup which rustc --toolchain rust_stage2) --version rustc 1.70.0-dev [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.005 MB perf.data (70 samples) ] ; perf report Samples: 70 of event 'cycles:u', Event count (approx.): 21356967 Overhead Command Shared Object Symbol 51.55% rustc ld-linux-aarch64.so.1 [.] _dl_lookup_symbol_x 18.70% rustc ld-linux-aarch64.so.1 [.] _dl_relocate_object 11.95% rustc ld-linux-aarch64.so.1 [.] do_lookup_x 5.55% rustc [unknown] [k] 0xffffa9ad41cfcfdc 2.68% rustc libc.so.6 [.] __GI___strlen_asimd 2.42% rustc librustc_driver-1a385c366c35e81a.so [.] llvm::StringMapImpl::LookupBucketFor 2.16% rustc librustc_driver-1a385c366c35e81a.so [.] _GLOBAL__sub_I_X86InstructionSelector.cpp 1.96% rustc libstd-990fe978dab76ef3.so [.] <alloc::vec::Vec<T,A> as core::clone::Clone>::clone 1.60% rustc librustc_driver-1a385c366c35e81a.so [.] llvm::cl::opt<bool, false, llvm::cl::parser<bool> >::~opt 1.22% rustc ld-linux-aarch64.so.1 [.] strcmp 0.13% rustc ld-linux-aarch64.so.1 [.] stat64 0.05% rustc ld-linux-aarch64.so.1 [.] __minimal_calloc 0.02% rustc ld-linux-aarch64.so.1 [.] __GI___tunables_init 0.02% rustc ld-linux-aarch64.so.1 [.] _dl_start 0.00% rustc [unknown] [k] 0xffffa9ad41cfd844 0.00% rustc ld-linux-aarch64.so.1 [.] _start ```
@bors r=Mark-Simulacrum rollup=iffy |
⌛ Testing commit 5eeeed1 with merge 62eb31bcf516293ac0a97bb31d11868594239953... |
💔 Test failed - checks-actions |
:/
@bors retry |
☀️ Test successful - checks-actions |
Finished benchmarking commit (42f28f9): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 645.963s -> 646.615s (0.10%) |
Huh, looks like this was a big improvement on hello world for some reason. I guess the compressed debuginfo makes it faster to load the executable into memory? I didn't think perf built rustc with debuginfo though ... |
We are building the standard library with debuginfo in CI I think - https://github.com/rust-lang/rust/blob/master/src/ci/run.sh#L95 So this probably makes sense. Let's see if we get any reports of problems -- I'm not sure if we need non-line-tables debuginfo for std for debugging to work of data structures (e.g., in gdb). I guess we have some tests for that and they did work, but sometimes our tests run in environments with different debug levels. |
…-Simulacrum bootstrap: Don't override `debuginfo-level = 1` to mean `line-tables-only` This has real differences in the effective debuginfo: in particular, it omits the module-level information and makes perf less useful (it can't distinguish "self" from "child" time anymore). Allow passing `line-tables-only` directly in config.toml instead. See https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/debuginfo.20in.20try.20builds/near/365090631 and https://rust-lang.zulipchat.com/#narrow/stream/238009-t-compiler.2Fmeetings/topic/.5Bsteering.5D.202023-06-09/near/364883519 for more discussion. This effectively reverts the cargo half of rust-lang#110221 to avoid regressing rust-lang#60020 again in 1.72.
before: 650 MB
line tables only: 335 MB
compressed only: 216 MB
compressed and line tables: 186 MB
no debuginfo at all: 130 MB
Here's an example backtrace:
with `debuginfo=1` (what we emit currently for `debuginfo-level-rustc = 1`)
with `debuginfo=line-tables-only` (what we'll emit for `debuginfo-level-rustc = 1`) after this change
with `debuginfo==0` (what we ship on nightly)
I want to investigate why
-C line-tables-only
is still ~tripling the size of the binary (update: done #110221 (comment)), but this seems like a good improvement in the meantime.I've tested that both valgrind and perf can read the debuginfo:
To test this, you can run
x build --stage 0 cargo
, setbuild.cargo = "build/host/stage0-tools-bin/cargo"
, and thenx build --stage 2 std
. You should be able to compare the rustc_driver.so outputs to each other:The difference between stage1 and stage2 is
debuginfo=1
vsdebuginfo=line-tables-only
. Both stages have-gz
(compressed debuginfo) enabled.This depends on rust-lang/cargo#11958 (and the exact commit of the cargo submodule will need to change before merging).
Helps with #104968.