Skip to content

Reproducible builds regression in nightly #47135

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
kpcyrd opened this issue Jan 2, 2018 · 13 comments
Closed

Reproducible builds regression in nightly #47135

kpcyrd opened this issue Jan 2, 2018 · 13 comments
Labels
P-medium Medium priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@kpcyrd
Copy link

kpcyrd commented Jan 2, 2018

hello,

I'm running a CI system with reprotest to ensure the binaries built from the project are reproducible and verifiable. This system started to fail between 2017-12-19 and 2017-12-26:

Build 193, 2017-12-19T00:56:25Z, e6446ad65d193e0155ac02d58f338f9136182267, https://travis-ci.org/kpcyrd/sniffglue/jobs/318377138
Build 196, 2017-12-26T01:20:33Z, e6446ad65d193e0155ac02d58f338f9136182267, https://travis-ci.org/kpcyrd/sniffglue/jobs/321613070

I assume this is a regression in rust nightly, I can reproduce the test failure locally with a current rust nightly. My tests pass when switching from nightly to stable (with some magic to access -Zremap-path-prefix-{from,to}.

My testsuite looks like this:

#!/bin/sh
set -xue

# tested with rustc 1.22.1 and cargo 0.23.0

# by default, the build folder is located in /tmp, which is a tmpfs. The target/ folder
# can become quite large, causing the build to fail if we don't have enough RAM.
export TMPDIR="$HOME/tmp/repro-test"
mkdir -p "$TMPDIR"

reprotest -vv --vary=-time,-domain_host --source-pattern 'Cargo.* src/' '
    RUSTC_BOOTSTRAP=1 CARGO_HOME="$PWD/.cargo" RUSTUP_HOME='"$HOME/.rustup"' \
        RUSTFLAGS="-Zremap-path-prefix-from=$HOME -Zremap-path-prefix-to=/remap-home -Zremap-path-prefix-from=$PWD -Zremap-path-prefix-to=/remap-pwd" \
        cargo build --release --verbose' \
    target/release/sniffglue

You can run this yourself using:

git clone https://github.com/kpcyrd/sniffglue.git
cd sniffglue
docker build -t reprotest-sniffglue -f docs/Dockerfile.reprotest .
docker run --privileged reprotest-sniffglue ci/reprotest.sh

The full diffoscope report is quite large, the gist looks like this:

INFO:reprotest:build successful, copying artifacts
INFO:reprotest:copying /root/tmp/repro-test/reprotest.QwImZ4/artifacts-experiment-1/ back from virtual server's /root/tmp/repro-test/tmp29t413le/experiment-1
INFO:reprotest:Running diffoscope: ['diffoscope', '--exclude-directory-metadata', '/root/tmp/repro-test/tmp29t413le/control', '/root/tmp/repro-test/tmp29t413le/experiment-1']
--- /root/tmp/repro-test/tmp29t413le/control
+++ /root/tmp/repro-test/tmp29t413le/experiment-1
├── source-root
│ ├── target
│ │ ├── release
│ │ │ ├── sniffglue
│ │ │ │ ├── readelf --wide --file-header {}
│ │ │ │ │ @@ -6,15 +6,15 @@
│ │ │ │ │    OS/ABI:                            UNIX - System V
│ │ │ │ │    ABI Version:                       0
│ │ │ │ │    Type:                              DYN (Shared object file)
│ │ │ │ │    Machine:                           Advanced Micro Devices X86-64
│ │ │ │ │    Version:                           0x1
│ │ │ │ │    Entry point address:               0x15410
│ │ │ │ │    Start of program headers:          64 (bytes into file)
│ │ │ │ │ -  Start of section headers:          7624168 (bytes into file)
│ │ │ │ │ +  Start of section headers:          7624176 (bytes into file)
│ │ │ │ │    Flags:                             0x0
│ │ │ │ │    Size of this header:               64 (bytes)
│ │ │ │ │    Size of program headers:           56 (bytes)
│ │ │ │ │    Number of program headers:         10
│ │ │ │ │    Size of section headers:           64 (bytes)
│ │ │ │ │    Number of section headers:         44
│ │ │ │ │    Section header string table index: 43
│ │ │ │ ├── readelf --wide --sections {}
│ │ │ │ │ @@ -1,8 +1,8 @@
│ │ │ │ │ -There are 44 section headers, starting at offset 0x7455e8:
│ │ │ │ │ +There are 44 section headers, starting at offset 0x7455f0:
│ │ │ │ │  
│ │ │ │ │  Section Headers:
│ │ │ │ │    [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
│ │ │ │ │    [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
│ │ │ │ │    [ 1] .interp           PROGBITS        0000000000000270 000270 00001c 00   A  0   0  1
│ │ │ │ │    [ 2] .note.ABI-tag     NOTE            000000000000028c 00028c 000020 00   A  0   0  4
│ │ │ │ │    [ 3] .note.gnu.build-id NOTE            00000000000002ac 0002ac 000024 00   A  0   0  4
│ │ │ │ │ @@ -40,14 +40,14 @@
│ │ │ │ │    [35] .debug_str        PROGBITS        0000000000000000 4704b8 0e5241 01  MS  0   0  1
│ │ │ │ │    [36] .debug_loc        PROGBITS        0000000000000000 5556f9 0c881e 00      0   0  1
│ │ │ │ │    [37] .debug_macinfo    PROGBITS        0000000000000000 61df17 000041 00      0   0  1
│ │ │ │ │    [38] .debug_pubtypes   PROGBITS        0000000000000000 61df58 02223b 00      0   0  1
│ │ │ │ │    [39] .debug_ranges     PROGBITS        0000000000000000 640193 07ef50 00      0   0  1
│ │ │ │ │    [40] .debug_macro      PROGBITS        0000000000000000 6bf0e3 013d65 00      0   0  1
│ │ │ │ │    [41] .symtab           SYMTAB          0000000000000000 6d2e48 02fee0 18     42 5529  8
│ │ │ │ │ -  [42] .strtab           STRTAB          0000000000000000 702d28 0426f8 00      0   0  1
│ │ │ │ │ -  [43] .shstrtab         STRTAB          0000000000000000 745420 0001c6 00      0   0  1
│ │ │ │ │ +  [42] .strtab           STRTAB          0000000000000000 702d28 042702 00      0   0  1
│ │ │ │ │ +  [43] .shstrtab         STRTAB          0000000000000000 74542a 0001c6 00      0   0  1
│ │ │ │ │  Key to Flags:
│ │ │ │ │    W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
│ │ │ │ │    L (link order), O (extra OS processing required), G (group), T (TLS),
│ │ │ │ │    C (compressed), x (unknown), o (OS specific), E (exclude),
│ │ │ │ │    l (large), p (processor specific)
│ │ │ │ ├── readelf --wide --symbols {}
│ │ │ │ │ @@ -5676,183 +5676,183 @@
│ │ │ │ │    5523: 000000000017d284   320 FUNC    LOCAL  DEFAULT   14 backtrace_dwarf_add
│ │ │ │ │    5524: 00000000001af930   104 FUNC    LOCAL  DEFAULT   14 je_malloc_tsd_boot1
│ │ │ │ │    5525: 0000000000197b50   132 FUNC    LOCAL  DEFAULT   14 je_chunk_boot
│ │ │ │ │    5526: 000000000019fc30    12 FUNC    LOCAL  DEFAULT   14 je_ctl_prefork
│ │ │ │ │    5527: 0000000000453958     0 OBJECT  LOCAL  DEFAULT   24 _DYNAMIC
│ │ │ │ │    5528: 0000000000177d0d    92 FUNC    LOCAL  DEFAULT   14 backtrace_release_view
│ │ │ │ │    5529: 000000000011c3f0  2038 FUNC    GLOBAL DEFAULT   14 _ZN5regex3dfa3Fsm12cached_state17hb554e8bfc5200e27E
│ │ │ │ │ -  5530: 00000000000bb960    30 FUNC    GLOBAL HIDDEN    14 _ZN4core3ptr13drop_in_place17hf44fe1997133c74dE.llvm.57E7137B
│ │ │ │ │ -  5531: 00000000000cb740    14 FUNC    GLOBAL DEFAULT   14 _ZN147_$LT$clap..args..arg_builder..option..OptBuilder$LT$$u27$n$C$$u20$$u27$e$GT$$u20$as$u20$clap..args..any_arg..AnyArg$LT$$u27$n$C$$u20$$u27$e$GT$$GT$8max_vals17h3d6e1bfec0acc71aE
│ │ │ │ │ -  5532: 000000000011a720   711 FUNC    GLOBAL HIDDEN    14 _ZN5regex8literals15LiteralSearcher3new17ha6ddbdce121a0edaE.llvm.FC380A3B
│ │ │ │ │ -  5533: 000000000012ad70   283 FUNC    GLOBAL HIDDEN    14 _ZN49_$LT$alloc..raw_vec..RawVec$LT$T$C$$u20$A$GT$$GT$7reserve17hda1b026b7e50ce87E
│ │ │ │ │ -  5534: 0000000000128f30    48 FUNC    GLOBAL HIDDEN    14 _ZN4core3ptr13drop_in_place17h277abafcc961cdd0E.llvm.DA02C37B
│ │ │ │ │ -  5535: 0000000000026130  1464 FUNC    GLOBAL HIDDEN    14 _ZN49_$LT$std..sync..mpsc..stream..Packet$LT$T$GT$$GT$4recv17hf720d50b94f350d7E
│ │ │ │ │ -  5536: 00000000000e2e50   433 FUNC    GLOBAL HIDDEN    14 _ZN65_$LT$clap..fmt..Format$LT$T$GT$$u20$as$u20$core..fmt..Display$GT$3fmt17hd3283ea0c9e52cdaE
│ │ │ │ │ -  5537: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND seccomp_load
│ │ │ │ │ -  5538: 000000000044b270    48 OBJECT  GLOBAL HIDDEN    23 vtable.o.llvm.FE355FBC
[...]

cc: @infinity0

@alexcrichton
Copy link
Member

Thanks for the report! Do you have a perhaps more isolated reproduction of this? A naive attempt to reproduce this locally (cargo build twice and see what changes) unfortunately wasn't able to reproduce this.

@infinity0
Copy link
Contributor

If you run reprotest --auto-build it will hopefully tell you which variation (e.g. time / timezone / fileordering) is causing the non-determinism.

Can we get a longer listing of the readelf --wide --symbols {} section? From what you pasted it looks like some table is getting re-ordered.

tag #34902

@kpcyrd
Copy link
Author

kpcyrd commented Jan 3, 2018

hey @alexcrichton and @infinity0, thanks for taking the time to look into this.

The project that I test covers a large amount of edge cases (eg. FFI), I probably need some time to track it down to a specific edgecases that is causing problems.

I tried --auto-build --vary=-time,-domain_host but it's not able to find a working configuration:

Not reproducible, even when fixing as much as reprotest knows how to. :(

I've attached a gist for both diffscope.out generated by reprotest and diffoscope.json generated with

diffoscope --json artifacts/diffoscope.json artifacts/control/source-root/target/release/sniffglue artifacts/experiment-1/source-root/target/release/sniffglue

https://gist.github.com/kpcyrd/ac5c8a4d8837d18d5f7f5bc074b71924

This was generated using:

$ rustup run nightly -- rustc --version
rustc 1.24.0-nightly (b65f0bedd 2018-01-01)
$ cargo +nightly version
cargo 0.25.0-nightly (a88fbace4 2017-12-29)
$ 

If it helps I can offer writing a script that tests nightly-2017-12-19 to nightly-2017-12-26 until it finds the nightly that broke (a full rustc bisect would probably take me a while to setup).

@gsollazzo gsollazzo added the regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. label Feb 1, 2018
@nikomatsakis nikomatsakis added I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 8, 2018
@nikomatsakis
Copy link
Contributor

Nominating for prioritization in @rust-lang/compiler meeting .

@nikomatsakis
Copy link
Contributor

@kpcyrd

Hi! I'm trying to figure out what is happening here. In particular, I can't tell how reliably this can be reproduced etc.

If it helps I can offer writing a script that tests nightly-2017-12-19 to nightly-2017-12-26 until it finds the nightly that broke (a full rustc bisect would probably take me a while to setup).

If it is possible to bisect over nightlies, that would be tremendously helpful.

@nikomatsakis
Copy link
Contributor

triage: P-medium

It is our intention to support reproducible builds, but they are not fully supported; calling P-medium. It'd be great to fix though.

@rust-highfive rust-highfive added P-medium Medium priority and removed I-nominated labels Feb 15, 2018
@kpcyrd
Copy link
Author

kpcyrd commented Feb 18, 2018

Sorry for my late reply. I've tested every nightly from 2017-12-19 to 2017-12-26:

nightly-2017-12-19: ✅ reproducible
nightly-2017-12-20: ✅ reproducible
nightly-2017-12-21: ✅ reproducible
nightly-2017-12-22: ✅ reproducible
nightly-2017-12-23: ✅ reproducible
nightly-2017-12-24: ✅ reproducible
nightly-2017-12-25: ✅ reproducible
nightly-2017-12-26: ❎ unreproducible

@kpcyrd
Copy link
Author

kpcyrd commented Feb 18, 2018

This is how I setup my tests:

git clone https://github.com/kpcyrd/sniffglue.git
cd sniffglue
git branch repro e6446ad65d193e0155ac02d58f338f9136182267
git checkout repro

Apply this patch:

diff --git a/ci/reprotest.sh b/ci/reprotest.sh
index 5bdd378..ecc48b7 100755
--- a/ci/reprotest.sh
+++ b/ci/reprotest.sh
@@ -8,8 +8,10 @@ set -xue
 export TMPDIR="$HOME/tmp/repro-test"
 mkdir -p "$TMPDIR"
 
+rustup install "nightly-2017-12-$1"
+
 reprotest -vv --vary=-time,-domain_host --source-pattern 'Cargo.* src/' '
-    RUSTC_BOOTSTRAP=1 CARGO_HOME="$PWD/.cargo" RUSTUP_HOME='"$HOME/.rustup"' \
+    RUSTC_BOOTSTRAP=1 CARGO_HOME="'$HOME'/.cargo" RUSTUP_HOME='"$HOME/.rustup"' \
         RUSTFLAGS="-Zremap-path-prefix-from=$HOME -Zremap-path-prefix-to=/remap-home -Zremap-path-prefix-from=$PWD -Zremap-path-prefix-to=/remap-pwd" \
-        cargo build --release --verbose' \
+        rustup run nightly-2017-12-'$1' cargo build --release --verbose' \
     target/release/sniffglue

Build the test container and run the tests:

# build container (reprotest-sniffglue)
BUILD_MODE=reprotest ci/build.sh
# test nightlies
docker run --privileged reprotest-sniffglue sh -c '(for x in `seq 19 26`; do ci/reprotest.sh "$x"; done)' | tee repro-regression.log

I can reproduce it reliably this way.

@michaelwoerister
Copy link
Member

Maybe this is because of multiple codegen units + ThinLTO? We enabled that by default during that time, didn't we, @alexcrichton? (#46910)

Does it also reproduce if you add -Ccodegen-units=1 to RUSTFLAGS?

@kpcyrd
Copy link
Author

kpcyrd commented Feb 19, 2018

@michaelwoerister The binary was reproducible with -Ccodegen-units=1 on both nightly-2017-12-26 and 2018-02-17. Nice!

I'm by no means an expert regarding these features, would it be possible to run codegen concurrently, wait until they finish and then sort the results before using them?

@alexcrichton
Copy link
Member

I believe this is fixed in nightly now? I'm not really sure why though. I bisected the PR that fixed this to #47522, although nothing there looks related to reproducible builds.

#47467 seems the most likely, but if that's true then it may mean that the bug is still lurking and hidden rather than fixed.

@kpcyrd
Copy link
Author

kpcyrd commented Feb 20, 2018

@alexcrichton You're right, I forgot to re-test nightly. I rebuilt with nightly a couple of times and everything was working nicely. I've re-enabled this test and started to add it for another project as well, I'm going to let you know if I notice anything again.

Thanks everybody!

@kpcyrd kpcyrd closed this as completed Feb 20, 2018
@michaelwoerister
Copy link
Member

Maybe ThinLTO is not entirely deterministic. I wouldn't be surprised.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
P-medium Medium priority regression-from-stable-to-nightly Performance or correctness regression from stable to nightly. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants