-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Cross-compiling Rust to s390x yields a faulty toolchain #80810
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Assigning P-high as discussed as part of the Prioritization Working Group procedure and removing I-prioritize. |
This has come up before when some part of the crate metadata is accidentally written in native-endian order -- so x86_64 rustc writes some little-endian data in the standard library rlibs, and s390x rustc reads that as big-endian. |
I can reproduce this on RHEL 7 s390x, but not ppc64, so it would seem to be something other than endian order. |
I know you said based on testing it doesn't seem to be endian order that's causing this issue, but in general can't we just standardize on either little-endian or big-endian somehow? |
searched nightlies: from nightly-2020-10-01 to nightly-2021-01-08 bisected with cargo-bisect-rustc v0.6.0Host triple: s390x-unknown-linux-gnu cargo bisect-rustc --start 2020-10-01 --end 2021-01-08 -- check That's the merge for #79637, a revert, which doesn't really make sense to me... |
Since this may be an rlib-related issue, did you have cargo-bisect-rustc run |
I think we do normalize to/from little endian, but it's possible something was missed. |
Doesn't it use a new target dir by default? That seems implied by the option to |
I think you're right – I forgot about that :) |
Sorry, I don't have time right now to check this properly but a quick answer I can give you now is: |
@Jakob-Naucke @cuviper is it easy for you to test against #80732?, we did a try build so you can install the compiler with that PR included using rustup-toolchain-install-master. |
@spastorino I'm afraid try does not generate an s390x target:
(but |
@spastorino yes, that's working. Built with
|
I do worry that we're still missing a root cause, because it doesn't look like that code should behave differently for cross compiling, let alone s390x in particular. That could be a totally unrelated codegen bug though, which just happens to get tickled the wrong way by this change. |
@Jakob-Naucke checkin in on this issue. Do you need a feedback? Do we have enough info to assess a context? I'm trying to figure out the status of this issue and the next steps. thanks! |
@apiraino By context, do you mean a category label? I'd say that this is a bug. There is a PR that fixes this, but it's blocked at the moment since it possibly breaks |
Even if it lands soon, #80732 isn't nominated for a beta backport, and it looks like it may be too invasive for that. I wonder if you can run a So if that build works with otherwise identical toolchain bits, then maybe we can look for binary differences with the cross-compiled build. If that native build fails, maybe we can find a difference from your "normal" native toolchain. |
Using Output
It turns out that the file
This file is indeed generated when building on x86_64 for s390x. When building for x86_64 on x86_64, the file |
I literally just got to that point of missing Maybe we can try just a plain native build of GCC 5.2.0 and see how that compares. The C/C++ toolchain might not matter much anyway, since the problem we're seeing is in the type system, firmly in Rust's domain. So I guess a potential codegen issue would just be in LLVM, less likely to be a two-level miscompilation of LLVM causing miscompilation of Rust. Sorry for all the guesswork -- I'm just trying to find traction on the underlying issue. I think I will next try a detailed comparison of running the native beta vs. cross-compiled beta, regardless of the system toolchain difference. |
Well, I just encountered this on a native build of the complete toolchain for s390x:
Before when I had tried native, I was only building There is also an ICE in that log, but it doesn't look directly related:
|
I ran a new native bisection using the |
I noticed that a bunch of |
Hm, I think it is fine if we revert that PR, as it was an attempt at optimizing the ABI, but we should also see if we can make a minimal reproducer so that we can pinpoint exactly what component is to blame here and if its LLVM – report it upstream. |
FWIW, after a successful scratch build, I pushed this revert patch to Fedora so we can ship 1.50. I'll continue trying to narrow down the effect before taking action here in rust-lang. |
Since the Rust 1.50.0 the s390x CI test job has been failing. This looks to be caused by rust-lang/rust#80810. Until the issue is resolved in a released version of rust this pins the rust version used in the job to the previous release 1.49.0 which does not have this issue and should work fine.
* Pin rust in s390x test job Since the Rust 1.50.0 the s390x CI test job has been failing. This looks to be caused by rust-lang/rust#80810. Until the issue is resolved in a released version of rust this pins the rust version used in the job to the previous release 1.49.0 which does not have this issue and should work fine. * Call rustup in the before_install stage
Filed: https://bugs.llvm.org/show_bug.cgi?id=49322 That came from analysis of a stage1 testing failure in rust/src/test/ui/proc-macro/auxiliary/first-second.rs Lines 10 to 15 in 15598a8
I see
This might not be the only issue, but it was the first thing that I could positively isolate. |
Excellent job with the debugging! Very impressive minimization. |
On the Rust side, maybe |
A fix for LLVM 49322 has landed, and I confirmed it does fix my minimization, but that Rust test still fails. It seems I must have minimized too much, so I'll have to step back and see what else went wrong. |
Scratch that, I think something in my build failed to re-link, even though I thought I had cleaned it sufficiently. I couldn't figure out why my rustc was still clobbering stack in that test, while its build of llc did the right thing, so I ran a completely clean rebuild with the fix D97514 applied and now it is fine. I confirmed both on the regressed commit 0183b41 and on 1.50.0 final. So, yay! 🎉 I'll prepare a backport for rust-lang/llvm-project which we can update into 1.52-nightly, and then we can decide whether to also backport for 1.51-beta before that moves to stable. |
@cuviper thanks for tracking this down! If possible, it would be great to the have the fix in 1.51 ... |
I've confirmed that both are working on the original reproducer and on the |
Code
I tried this code on s390x:
(originally discovered in and derived to make an MWE from gimli) with all of the following:
DEPLOY=1 ./src/ci/docker/run.sh dist-s390x-linux
on x86_64 and installed on s390xI expected to see this happen: a successful compiler run, as is on x86_64
Instead, this happened:
Furthermore, when building Rust with
x.py
on s390x after the switch to the aforementioned 2020-12-30 stage0 (introduced infe031180
), an ICE is claimed:Output
Rust compiled on s390x natively (immediately prior to that introduction) does not have this issue.
Since the Rust dists are built through cross-compilation for Tier 2 architectures, I believe that there is a problem specific to the cross-compilation with an s390x target.
Versions it worked on
It most recently worked on:
Versions with regression
e.g.
Backtrace
no additional info for code example, ICE output had backtrace enabled
@rustbot modify labels: +regression-from-stable-to-beta -regression-untriaged
The text was updated successfully, but these errors were encountered: