Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Cannot run tests in CI #47

Open
Tracked by #86
aaronmondal opened this issue Mar 24, 2023 · 4 comments
Open
Tracked by #86

Cannot run tests in CI #47

aaronmondal opened this issue Mar 24, 2023 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@aaronmondal
Copy link
Contributor

Attempts to run the tests in CI via remote execution currently doesn't work because Bazel doesn't like to run in a nix-built container. build and run works, but test doesn't, most likely due to bazelbuild/bazel#12579.

Technically it's already decent coverage if just builds pass, but many issues arise from dynamic linking behavior and are only visible during runtime. So at the moment we'd either have to run all examples manually without the ll_test wrappers, or only run a bazel build cpp without running anything.

Another option would be to build a custom Bazel which we distribute as part of rules_ll. Building a custom Bazel against an LLVM toolchain and statically linking libc++ could be an option that keeps things portable between CI and regular usage, but it might lead to issues for non-nix workflows.

@JannisFengler @SpamDoodler @jaroeichler What do you think? Statically linking Bazel with libc++ would add a few MB to all images, caches, the devenv etc because we'd have duplicate libc++ functions in every subbinary and we'd have to thinkg about infrastructure to support staying upstream with the bazel sources. That would make it easier to get remote execution to work though. Do we want to go down that path or should we try to find another solution?

@SpamDoodler
Copy link
Contributor

Personally, I think we actually want to have a custom build Bazel. This would give us even better control over the whole build environment.
Concerning the issues you raised, I think it's ok to prioritize the nix workflow for now, since I don't see a huge drawback with using nix. The few MB for libc++ should be no problem either, since storage is quite inexpensive.
I think it is the spirit of rules_ll to provide the most advanced toolchain possible, so I'm happy to prioritize remote execution over those minor inconveniences.

@jaroeichler
Copy link
Contributor

Statically linking libc++ is fine, those few MB are neglectable in comparison to the cache and nix environment size. We should keep dynamical linking in mind if image size becomes an issue in the future.

Yes, we should aim for a custom build Bazel, this aligns well with the rest of the rules_ll project. Could you go into further detail on how you want to handle the patching and building of Bazel?

@aaronmondal
Copy link
Contributor Author

@jaroeichler I initially tried just patching the RPATHs with patchelf, but then bazel refuses to operate. Probably for security reasons.

My current plan is:

  • Write a nix package that builds bazel via the non-upstream llvm toolchain from nixpkgs and statically link libc++ into it.
  • Distribute that Bazel in a way that is compatible with Bazelisk's custom release mechanism outlined here.
  • Fetch the binary in our remote execution images via bazelisk.

If things work as i intend, we'd end up with remote execution images that no longer require libstdc++ or any gcc-toolchain parts. If we can reference these custom Bazel binaries in .bazelversion this approach should also be portable to non-nix users as long as the LLVM toolchain parts are statically linked.

As an interesting sidenote we could also try to statically link libmusl into that release to create a fat binary that is independent of the host's glibc version. But let's leave this for later when things actually work 😅

@aaronmondal aaronmondal self-assigned this Apr 10, 2023
@aaronmondal aaronmondal added the bug Something isn't working label Apr 11, 2023
@aaronmondal aaronmondal mentioned this issue Apr 15, 2023
6 tasks
@aaronmondal
Copy link
Contributor Author

Ok remote execution works, so we could run tests in CI. But that might be really expensive. A single build with near perfect cache reuse (which we basically always have) still needs ~2GB of artifacts to operate (makes sense, building a single target requires the tools from the ll_toolchain, which is roughly that size). At 1 commit per day this is ~60GB just for the main branch. This does not include any PR testing etc. This is also a minimum value. For instance, updating LLVM alone which requires a full cache rebuild and a few revisions might be many times larger than that.

We probably still need a fraction of the resources that others would need for a similar setup, but it's still a big setup. We it might be better off hosting our own remote exec cluster.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants