-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
qemu-aarch64 nondeterministic crashes #188
Comments
We have a Ruby app depends on quite a bunch of "native" extensions packages - that are packages having to be built through gcc Targeting we have consistent Similar thing happened with a Go project needed some stuff built with gcc.
UPDATE: Still investigating, no clarity ... Issue is happening sporadically, combined with caching it is really difficult to get a good grip on what is the cause and what is a fix. |
the qemu that is shipped by this action seemed broken the last time i used it. i recommend trying to use your base images qemu instead. |
Indeed there are some very strange and sporadic Qemu v7 used to work well for quite some time but not anymore, since ~1-2 weeks it started resulting in sporadic but stubborn segmentation faults in gcc and other executables. Having build cache lead this to be very difficult to identify and fix, so I am still not sure how stable the fix is, but
|
As this is qemu related can you open an issue on https://github.com/tonistiigi/binfmt? Thanks |
there is related ticket already tonistiigi/binfmt#215 |
This does not only affect aarch64 running on x86_64 but also at least s390x & ppc64le. @crazy-max, I understand the issue is upstream so not asking to re-open the issue but could it be at least pinned in order for users of the action to easily find the issue and workaround while waiting for a fix upstream ? For aarch64, another solution is to switch from emulated to native with the availability of ubuntu-24.04-arm runners: https://github.blog/changelog/2025-01-16-linux-arm64-hosted-runners-now-available-for-free-in-public-repositories-public-preview/ |
@mayeut or anyone else would you have an example for how to use a runner to handle an architecture specific part of a docker bake? |
### Before submitting Please complete the following checklist when submitting a PR: - [ ] All new features must include a unit test. If you've fixed a bug or added code that should be tested, add a test to the [`tests`](../tests) directory! - [ ] All new functions and code must be clearly commented and documented. If you do make documentation changes, make sure that the docs build and render correctly by running `make docs`. - [ ] Ensure that the test suite passes, by running `make test`. - [x] Add a new entry to the `.github/CHANGELOG.md` file, summarizing the change, and including a link back to the PR. - [ ] Ensure that code is properly formatted by running `make format`. When all the above are checked, delete everything above the dashed line and fill in the pull request template. ------------------------------------------------------------------------------------------------------------ **Context:** The [aarch64 wheel build CI action has been failing](https://github.com/PennyLaneAI/pennylane-lightning/actions/workflows/wheel_linux_aarch64.yml) since circa 24 Jan 2025. They fail with a segmentation fault during the CIBW process. This has also been observed for similar wheel builds with QEMU with other repositories: docker/setup-qemu-action#188 ssciwr/clang-format-wheel#124 tonistiigi/binfmt#215 tonistiigi/binfmt#165 and fix attempt: ssciwr/clang-format-wheel#125 It is due to using an old version (v7) of qemu that comes with binfmt. `setup-qemu-action` by default uses `binfmt:latest` image which has not been updated in 2 years. **Description of the Change:** Use a newer QEMU image (v8) from binfmt. **Benefits:** aarch64 wheel builds will succeed again, [e.g.](https://github.com/PennyLaneAI/pennylane-lightning/actions/runs/13019772888?pr=1056) **Possible Drawbacks:** **Related GitHub Issues:** [sc-83297] --------- Co-authored-by: ringo-but-quantum <github-ringo-but-quantum@xanadu.ai> Co-authored-by: Ali Asadi <10773383+maliasadi@users.noreply.github.com>
See: docker/setup-qemu-action#188 (comment) Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
See: docker/setup-qemu-action#188 (comment) Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
@smoke Hi there. Just sharing a data point that specifying more recent qemu version helped in my case. Thanks a lot! For Linux Kernel BPF CI, Github Actions runner docker images for arm64 and s390x are built with
I also saw .NET segfaulting when building an app for s390x in similar environment, might be also related. |
See: docker/setup-qemu-action#188 (comment) Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
The run-on-arch action is using docker to run things on various platforms. Docker in turn relies on qemu-user-static installed on the system. Recently there were various reports about multi-arch docker builds failing with seemingly random issues, and it appears to boil down to qemu [1]. I stumbled on this problem while updating s390x runners [2], and setting up more recent version of qemu helped. [1] docker/setup-qemu-action#188 [2] kernel-patches/runner#69 Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
The run-on-arch action is using docker to run things on various platforms. Docker in turn relies on qemu-user-static installed on the system. Recently there were various reports about multi-arch docker builds failing with seemingly random issues, and it appears to boil down to qemu [1]. I stumbled on this problem while updating s390x runners [2], and setting up more recent version of qemu helped. Install docker buildx and enforce it's usage with DOCKER_BUILDKIT=1 [3], so that it's used by run-on-arch action. [1] docker/setup-qemu-action#188 [2] kernel-patches/runner#69 [3] https://docs.docker.com/build/buildkit/#getting-started Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
run-on-arch-action is simply a wrapper around docker. There is no value in using it in libbpf, as it is not complicated to run non-native arch docker images directly on github-hosted runners. Docker relies on qemu-user-static installed on the system to emulate different architectures. Recently there were various reports about multi-arch docker builds failing with seemingly random issues, and it appears to boil down to qemu [1]. I stumbled on this problem while updating s390x runners [2] for BPF CI, and setting up more recent version of qemu helped. This change addresses recent build failures on s390x and ppc64le. [1] docker/setup-qemu-action#188 [2] kernel-patches/runner#69 [3] https://docs.docker.com/build/buildkit/#getting-started Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
run-on-arch-action is simply a wrapper around docker. There is no value in using it in libbpf, as it is not complicated to run non-native arch docker images directly on github-hosted runners. Docker relies on qemu-user-static installed on the system to emulate different architectures. Recently there were various reports about multi-arch docker builds failing with seemingly random issues, and it appears to boil down to qemu [1]. I stumbled on this problem while updating s390x runners [2] for BPF CI, and setting up more recent version of qemu helped. This change addresses recent build failures on s390x and ppc64le. [1] docker/setup-qemu-action#188 [2] kernel-patches/runner#69 [3] https://docs.docker.com/build/buildkit/#getting-started Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
* Update main.yml * Update main.yml * Update main.yml docker/setup-qemu-action#188 * Update main.yml * Update main.yml * Update main.yml * Update main.yml * Update main.yml * Update main.yml * Update main.yml * Update main.yml * Update main.yml * Update main.yml * Update main.yml * Update main.yml * Update main.yml
I'm sorry for this comment on a closed issue. QEMU v8.1.5 seem to not address the problem with arm64v8, at least for the run I did. It keeps segfaulting when running some commands. |
Run qemu on Ubuntu 22.04 See also: - actions/runner-images#11471 - docker/setup-qemu-action#188 - docker/setup-qemu-action#198 Upgrading to QEMU v8.1.5 doesn't seem to help, so closes #1529 I runt the CI multiple times and it always worked, so I think this downgrade really "fixes" the issue. These changes are made under both the "Apache 2.0" and the "GNU Lesser General Public License 2.1 or later" license terms (dual license). SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later
…QEMU related crash on Aarch64. See docker/setup-qemu-action#188
run-on-arch-action is simply a wrapper around docker. There is no value in using it in libbpf, as it is not complicated to run non-native arch docker images directly on github-hosted runners. Docker relies on qemu-user-static installed on the system to emulate different architectures. Recently there were various reports about multi-arch docker builds failing with seemingly random issues, and it appears to boil down to qemu [1]. I stumbled on this problem while updating s390x runners [2] for BPF CI, and setting up more recent version of qemu helped. This change addresses recent build failures on s390x and ppc64le. [1] docker/setup-qemu-action#188 [2] kernel-patches/runner#69 [3] https://docs.docker.com/build/buildkit/#getting-started Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Contributing guidelines
I've found a bug, and:
Description
wanted to report that i recently switched a build pipeline running on ubuntu to the upstream ubuntu qemu-user-static. the upstream ubuntu qemu is further along on version number than this action btw, so that this saves people time.
we were witnessing a problem where qemu-aarch64 installed by this action would crash nondeterministically on github's x86-64 runners as well as on self-hosted runners using 12th gen and 13th gen intel systems. we dont have time to triage and diagnose the issue, because it's non deterministic its not clear what the underlying cause is.
Expected behaviour
qemu-aarch64 doesnt crash at random
Actual behaviour
qemu-aarch64 crashed at random
Repository URL
No response
Workflow run URL
No response
YAML workflow
n/a
Workflow logs
No response
BuildKit logs
Additional info
No response
The text was updated successfully, but these errors were encountered: