Skip to content

reproducible builds broken in rustc 1.56.0 due to LLVM 13 update #90301

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
Fuuzetsu opened this issue Oct 26, 2021 · 22 comments · Fixed by #90759
Closed

reproducible builds broken in rustc 1.56.0 due to LLVM 13 update #90301

Fuuzetsu opened this issue Oct 26, 2021 · 22 comments · Fixed by #90759
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-reproducibility Area: Reproducible / deterministic builds C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Fuuzetsu
Copy link

Fuuzetsu commented Oct 26, 2021

I attach a tarball with a small reproducer (few dependencies from crates.io, lock file, empty lib.rs).

repro.tar.gz

Now run this using rustc 1.56.0:

for i in $(seq 1 10); do cargo clean &&  cargo build --release --quiet && sha256sum target/release/deps/libpalette_derive-*.so; done

I expect the sha256sum to be the same for the libpalette so file but it's often different.

This breaks any binary caches, similarly to what #89904 did but that ticket turned out to be a problem with the crate.

This time it seems problem with rustc itself (or rather LLVM: stay tuned): 1.55.0 produces same binary/crate hash on every build.

Note that in Cargo.toml there's:

[profile.release]
debug = true

Without this, the issue does not present itself. This perhaps makes it look similar to #89911 or #45397 but I think it's different. As far as I saw, it didn't produce a difference in ordering of symbols in the resulting .so though I may be mistaken. Please feel free to close as duplicate if you deem it to be the same issue.

I have ran with RUSTC_LOG=debug and after careful sifting through few GiB of output, I found the differences seem to start in rustc_codegen_ssa.

I have bisected rustc itself and ran it on our original code which exhibited the problem. I started the bisect at common merge point with 1.55.0 though the problem turned out well into the 1.56.0 release:

$ git bisect good
db002a06ae9154a35d410550bc5132df883d7baa is the first bad commit
commit db002a06ae9154a35d410550bc5132df883d7baa
Merge: e7f7fe462a5 306259c6459
Author: bors <bors@rust-lang.org>
Date:   Sat Aug 21 09:25:28 2021 +0000

    Auto merge of #87570 - nikic:llvm-13, r=nagisa
    
    Upgrade to LLVM 13
    
    Work in progress update to LLVM 13. Main changes:
    
     * InlineAsm diagnostics reported using SrcMgr diagnostic kind are now handled. Previously these used a separate diag handler.
     * Codegen tests are updated for additional attributes.
     * Some data layouts have changed.
     * Switch `#[used]` attribute from `llvm.used` to `llvm.compiler.used` to avoid SHF_GNU_RETAIN flag introduced in https://reviews.llvm.org/D97448, which appears to trigger a bug in older versions of gold.
     * Set `LLVM_INCLUDE_TESTS=OFF` to avoid Python 3.6 requirement.
    
    Upstream issues:
    
     * ~~https://bugs.llvm.org/show_bug.cgi?id=51210 (InlineAsm diagnostic reporting for module asm)~~ Fixed by https://github.com/llvm/llvm-project/commit/1558bb80c01b695ce12642527cbfccf16cf54ece.
     * ~~https://bugs.llvm.org/show_bug.cgi?id=51476 (Miscompile on AArch64 due to incorrect comparison elimination)~~ Fixed by https://github.com/llvm/llvm-project/commit/81b106584f2baf33e09be2362c35c1bf2f6bfe94.
     * https://bugs.llvm.org/show_bug.cgi?id=51207 (Can't set custom section flags anymore). Problematic change reverted in our fork, https://reviews.llvm.org/D107216 posted for upstream revert.
     * https://bugs.llvm.org/show_bug.cgi?id=51211 (Regression in codegen for #83623). This is an optimization regression that we may likely have to eat for this release. The fix for #83623 was based on an incorrect premise, and this needs to be properly addressed in the MergeICmps pass.
    
    The [compile-time impact](https://perf.rust-lang.org/compare.html?start=ef9549b6c0efb7525c9b012148689c8d070f9bc0&end=0983094463497eec22d550dad25576a894687002) is mixed, but quite positive as LLVM upgrades go.
    
    The LLVM 13 final release is scheduled for Sep 21st. The current nightly is scheduled for stable release on Oct 21st.
    
    r? `@ghost`

 .gitmodules                                        |  2 +-
 compiler/rustc_codegen_llvm/src/back/write.rs      | 43 +---------
 compiler/rustc_codegen_llvm/src/base.rs            |  8 +-
 compiler/rustc_codegen_llvm/src/consts.rs          | 15 +++-
 compiler/rustc_codegen_llvm/src/context.rs         | 55 +++++++++----
 compiler/rustc_codegen_llvm/src/lib.rs             |  2 +-
 compiler/rustc_codegen_llvm/src/llvm/diagnostic.rs | 94 +++++++++++++++++-----
 compiler/rustc_codegen_llvm/src/llvm/ffi.rs        |  7 +-
 compiler/rustc_codegen_ssa/src/traits/misc.rs      |  2 +
 compiler/rustc_codegen_ssa/src/traits/statics.rs   | 18 ++---
 compiler/rustc_feature/src/accepted.rs             |  2 +-
 compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp   | 20 ++++-
 .../src/spec/powerpc64_unknown_linux_gnu.rs        |  2 +-
 .../src/spec/powerpc64_unknown_linux_musl.rs       |  2 +-
 .../rustc_target/src/spec/powerpc64_wrs_vxworks.rs |  2 +-
 .../src/spec/powerpc64le_unknown_linux_gnu.rs      |  2 +-
 .../src/spec/powerpc64le_unknown_linux_musl.rs     |  2 +-
 .../src/spec/wasm32_unknown_emscripten.rs          |  2 +-
 .../src/spec/wasm32_unknown_unknown.rs             |  2 +-
 compiler/rustc_target/src/spec/wasm32_wasi.rs      |  2 +-
 .../src/spec/wasm64_unknown_unknown.rs             |  2 +-
 src/bootstrap/native.rs                            |  1 +
 src/llvm-project                                   |  2 +-
 src/test/codegen/array-equality.rs                 |  4 +-
 src/test/codegen/issue-83623-SIMD-PartialEq.rs     | 46 -----------
 src/test/codegen/repeat-trusted-len.rs             |  2 +-
 .../run-make-fulldeps/coverage-llvmir/Makefile     |  2 -
 .../coverage-llvmir/filecheck.testprog.txt         | 18 ++---
 src/test/ui/llvm-asm/issue-69092.rs                |  4 +-
 src/test/ui/llvm-asm/issue-69092.stderr            |  4 +-
 30 files changed, 201 insertions(+), 168 deletions(-)
 delete mode 100644 src/test/codegen/issue-83623-SIMD-PartialEq.rs

cc @nikic who did the update

Meta

rustc --version --verbose:

$ rustc --version --verbose
rustc 1.56.0 (09c42c458 2021-10-18)
binary: rustc
commit-hash: 09c42c45858d5f3aedfa670698275303a3d19afa
commit-date: 2021-10-18
host: x86_64-unknown-linux-gnu
release: 1.56.0
LLVM version: 13.0.0
@Fuuzetsu Fuuzetsu added the C-bug Category: This is a bug. label Oct 26, 2021
@bjorn3 bjorn3 added A-reproducibility Area: Reproducible / deterministic builds T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 26, 2021
@yanok
Copy link
Contributor

yanok commented Oct 29, 2021

One thing I found: if I run it with RUSTFLAGS='--emit=llvm-ir' I can't reproduce it anymore.

@rustbot claim

@yanok
Copy link
Contributor

yanok commented Oct 30, 2021

Well, I was wrong. Still reproducible even with RUSTFLAGS='--emit=llvm-ir'. Produced *.ll files are identical though.

@fangism
Copy link

fangism commented Oct 31, 2021

Well, I was wrong. Still reproducible even with RUSTFLAGS='--emit=llvm-ir'. Produced *.ll files are identical though.

Consistent with my findings with --emit=llvm-ir as well.

@yanok
Copy link
Contributor

yanok commented Nov 8, 2021

Update: this IS a LLVM bug, I'm able to reproduce it with just ll input file using mainline LLVM. Filed https://bugs.llvm.org/show_bug.cgi?id=52441 to LLVM.

@yanok
Copy link
Contributor

yanok commented Nov 9, 2021

I have a fix: https://reviews.llvm.org/D113468

Do we want to apply it to llvm-rust or are we going to wait for LLVM review?

yanok added a commit to yanok/rust that referenced this issue Nov 10, 2021
bors added a commit to rust-lang-ci/rust that referenced this issue Nov 10, 2021
Update llvm submodule

This includes debug info generation fix, that fixes rust-lang#90301.

Also includes WASM backend related fix for https://bugs.llvm.org/show_bug.cgi?id=52352
(I haven't found a corresponding Rust bug).
@bors bors closed this as completed in 8128916 Nov 10, 2021
@Fuuzetsu
Copy link
Author

@yanok Good job, thank you!

@glandium
Copy link
Contributor

I wonder if rust needs https://reviews.llvm.org/D108968 too.

@glandium
Copy link
Contributor

I wonder if rust needs https://reviews.llvm.org/D108968 too.

Answering my own question: it does.

@glandium
Copy link
Contributor

@yanok Could you also do a PR for https://reviews.llvm.org/D108968?

@yanok
Copy link
Contributor

yanok commented Nov 16, 2021

Sure, will do.

@yanok
Copy link
Contributor

yanok commented Nov 17, 2021

Created #90978

@yanok
Copy link
Contributor

yanok commented Nov 17, 2021

Actually I was too slow, there is already #90954

@glandium
Copy link
Contributor

FWIW, while taking the 1.56.0 source and applying the 2 LLVM reproducibility patches fixes the issues we have with Firefox, the latest nightly doesn't: there's a remaining reproducibility issue on i686 linux. I'll test a patched beta next.

@cuviper
Copy link
Member

cuviper commented Nov 19, 2021

The next beta release should be fixed, per #90938.
(Should go out near 00:00 UTC -- I guess it will be 1.57.0-beta.4.)

@glandium
Copy link
Contributor

the beta branch + the patches seems to fix it, but I've found something new: depending how llvm is built, the result can vary. Specifically, llvm built with a sysroot exhibits reproducibility issues that are not present when not building with a sysroot. I'll wait for the official build for 1.57.0-beta.4 for a definite answer whether the issue is fixed. One thing is sure, though, there's an additional regression in nightly.

@glandium
Copy link
Contributor

So, as far as Firefox is concerned, everything is fixed in the latest betas. I did find a source of non-determinism that is not fixed, but a) it seems to also affects clang (as in, I also have found one in clang, and I think it's the same root cause) b) it goes away with PGO, so it doesn't actually affect Firefox.

@Fuuzetsu
Copy link
Author

Fuuzetsu commented Dec 1, 2021

@glandium is there a ticket about this one somewhere? It's great that it doesn't impact Firefox but it may be impacting other rust software

@glandium
Copy link
Contributor

glandium commented Dec 2, 2021

I haven't filed it because the smallest reproducer I have at the moment is "build firefox with these flags and observe how sometimes some functions have an extra mov", and from experience, those don't lead to any action.

@Fuuzetsu
Copy link
Author

Fuuzetsu commented Dec 3, 2021

I haven't filed it because the smallest reproducer I have at the moment is "build firefox with these flags and observe how sometimes some functions have an extra mov", and from experience, those don't lead to any action.

I see, thanks. It sounds like binary-reproducability thing and ABI should(?) be the same at least.

And this is a regression in new LLVM version (rustc 1.56 and beyond)?

@glandium
Copy link
Contributor

glandium commented Dec 3, 2021

I don't know if this affects rustc, but I narrowed down yet another source of non-determinism in debuginfo, and upstream came up with a patch: https://reviews.llvm.org/D115054

@Fuuzetsu
Copy link
Author

Fuuzetsu commented Dec 6, 2021

We tried using 1.57.0 and it seems broken still. I'll try to get a repro tomorrow. Should I open a new ticket or will someone reopen this one?

@Fuuzetsu
Copy link
Author

Fuuzetsu commented Dec 7, 2021

We tried using 1.57.0 and it seems broken still. I'll try to get a repro tomorrow. Should I open a new ticket or will someone reopen this one?

Actually, seems it comes from a derive crate in our dependency tree, similar to what we saw in #89904 ; So maybe it's fine.

EDIT: JelteF/derive_more#178

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-reproducibility Area: Reproducible / deterministic builds C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants