Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

manylinux_2_34 x86-64 builds produce binaries that are not compatible with all x86-64 CPUs #1725

Open
alex opened this issue Dec 12, 2024 · 10 comments

Comments

@alex
Copy link
Member

alex commented Dec 12, 2024

manylinux_2_34 is built on AlmaLinux 9. Alma Linux is built for the x86-64-v2 sub-architecture, which assumes that a particular set of x86-64 CPU extensions. (See https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level#recommendations_for_rhel_9)

As a result, wheels built in manylinux_2_34 by the system compiler will use these CPU extensions, making the wheels not compatible with all x86-64 CPUs, which of course results will result in SIGILL at runtime.

Because wheel tags have no awareness of x86-64-v2, this effectively makes binaries built with manylinux_2_34 unusable.

See pyca/cryptography#12069 for an example of the impact of this.

@njsmith
Copy link
Member

njsmith commented Dec 12, 2024

The immediate fix here is almost certainly to force the manylinux compilers to default to -march=x86-64 somehow, given that the current -march=x86-64-v2 default is, empirically, breaking stuff.

Longer term maybe there's a way for installers to be cleverer or something but that's probably beyond the scope of an issue here.

@alex
Copy link
Member Author

alex commented Dec 12, 2024

Yeah, I think there's a question about how force the system compile to have an -march=....

In terms of installer cleverness, I think that'd require a PEP to standardize a wheel tag for this?

@eli-schwartz
Copy link

https://git.almalinux.org/rpms/gcc/src/commit/33c7fdbe5394937f20abfaf7a6709864c0fab3d6/gcc.spec#L1163-L1172

It is... fascinating... that they are deploying the compiler with a default -march=... value rather than building all packages for RHEL / Alma using CFLAGS=".... -march=x86-64-v2". This is a pretty bad footgun for building statically linked / libc.so-only code on RHEL in order to deploy standalone binaries on other platforms.

Overriding CFLAGS to include -march=x86-64 should work at least, but only if users don't override CFLAGS, or stick to passing CFLAGS via CFLAGS="$CFLAGS -more-cflags-here". The most robust option will be if manylinux installs a custom gcc / g++ wrapper script:

#!/bin/sh

exec /path/to/real/gcc -march=x86-64 "$@"

@mayeut mayeut pinned this issue Dec 14, 2024
@mayeut
Copy link
Member

mayeut commented Dec 14, 2024

In terms of installer cleverness, I think that'd require a PEP to standardize a wheel tag for this?

We'd need a PEP to support this properly.
The best that can be done here is what has been suggested by @eli-schwartz: wrap all gcc binaries. This won't solve the issue if someone wants to graft binaries that were installed with dnf install ... but at least, building from sources should be fixed.
It seems usage of x86-64-v2 could be detected at ELF level (at least in some cases) and that should probably go in auditwheel (likely default to failure with a way to override).

@mayeut
Copy link
Member

mayeut commented Dec 21, 2024

The compilers calls shall all be wrapped now.
Next item on my list will be auditwheel.

@mayeut
Copy link
Member

mayeut commented Feb 2, 2025

building from sources should be fixed.

Well, it turns out that it's not necessarily true. auditwheel tests (local branch for now) are showing that executables with no dependencies are getting tagged as requiring x86-64-v2. This comes from objects file linked into the executable:

readelf -a /lib64/crt1.o | grep v2
	x86 ISA needed: x86-64-baseline, x86-64-v2

Other support object files do not set this (nor do their disassembly contain things that do not look like baseline x86-64).
The /lib64/crt1.o disassembly does not look like it contains things other than baseline x86-64.

It seems usage of x86-64-v2 could be detected at ELF level (at least in some cases)

This requires to build using the GCC option -mneeded so that's declarative and probably little projects are using those annotations. auditwheel can detect those and bail out but it's more likely that things will go undetected without proper annotations (the cryptography wheels do not fail the check for example).

@ofek
Copy link

ofek commented Feb 11, 2025

We are also affected by this in a different way. Previously we were using these images to produce a prebuilt installation by downloading the x86_64_v3 standalone build and now resolution with pip cannot download some wheels, for example ada-url==1.16.0.

ERROR: Could not find a version that satisfies the requirement ada-url~=1.16.0 (from datadog-agent-dev) (from versions: 1.5.0, 1.6.0, 1.7.0, 1.8.0, 1.10.0, 1.11.0, 1.12.0, 1.13.0, 1.15.0, 1.15.1, 1.15.3)

Does that mean previous images were using v3+ and now it's based on v2 or was there some sort of CPU-agnostic setup before now? I'm not sure how to proceed other than simply using a different image.

FYI this only started happening in the past day or two.

edit: contrary to the title of this issue we started experiencing this on quay.io/pypa/manylinux2014_x86_64

@eli-schwartz
Copy link

I'm not certain I understand the question. Pip doesn't know or care whether cpython has been compiled with AVX, FMA, MOVBE etc.

What are the relevant manylinux tags available on PyPI, and what glibc version is your image running?

@eli-schwartz
Copy link

eli-schwartz commented Feb 11, 2025

now resolution with pip cannot download some wheels

Note that CPU microarchitectures aren't encodable in wheel metadata at all, so cannot affect resolution in any way as far as I'm aware. The only information you're able to encode is:

  • "x86-64 as opposed to aarch64 or i686"
  • glibc or musl, and also, which version of glibc
  • cpython version

If the compiler produces ELF objects that don't actually run on "x86-64" but only run on "x86-64 assuming that gcc -march=native would produce AVX instructions" then that simply results in what pip calls a perfectly valid x86-64 wheel, but which aborts at runtime when you try to import the code. That's why the issue is such a big one, since it "tricks" end user systems into thinking that they can use wheels and then they get a broken virtualenv that is difficult to debug since pip says everything is fine, but the interpreter keeps on crashing.

@ofek
Copy link

ofek commented Feb 11, 2025

My apologies, you're right of course. A new release of that package coincided with them dropping older tags.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants