Discussion: CI resources #780

alanz · 2020-12-31T22:23:37Z

At the moment CI is becoming a major bottleneck for landing PRs.

Given that HLS is the first point of contact for many people with Haskell, I believe it is critical that we test as many different configuration options as possible, preferably on a continuous basis.

This goes along with a second important point, it must be easy to make a release, ideally just polishing a change log and tagging, once a month.

We are hitting the limits of the free resources available to OSS projects.

The Haskell Foundation is now a thing, and there is funding for the GHC infrastructure. An initial discussion on IRC with @bgamari indicates that we may be able to host the HLS CI on the GHC gitlab infrastructure (keeping dev in GH, as now).

Is this something we should explore?

jneira · 2021-01-01T01:23:02Z

ci infra considerations apart, we should review our ci and try to cut jobs that will catch errors with almost no probabilities to be triggered
only test last minor versions of ghc like we did in ghcide f.e.
maybe @pepeiborra can help to see where we can drop more jobs (but not too much 😝)

bgamari · 2021-01-01T01:45:56Z

For what it's worth, I have put together a quick hack to act as a basis for discussion: https://gitlab.haskell.org/bgamari/haskell-language-server/-/merge_requests/1. At the moment it is running about 8 concurrent jobs, with each taking around 10 minutes (although this should be shorter once the cabal caches are warmed). GHC CI load fluctuates pretty significantly, but we have some capacity set aside for "short" jobs (e.g. not GHC builds, which generally take ~90 minutes). One area where we are currently lacking is Darwin, although this will improve soon as we bring up a new set of donated runners.

Note that GitLab can be used in a CI only mode for a GitHub repository (although I'll admit I've not used it in this way yet).

bgamari · 2021-01-01T02:03:19Z

Another thing that really ought to be considered regardless of where CI happens: Currently it appears that benchmarks are the largest contributor to CI build time. However, these really don't need to be run on every platform. The sorts of performance regressions that haskell-language-server will feel are almost certainly platform independent.

jneira · 2021-01-01T02:20:12Z

it t is too late but to take in account for future decisions: we lose a queue of free jobs merging ghcide in hls including its functional test suite and benchmarks.
maybe we should review what are we testing in both suites and remove redundant tests

jneira · 2021-01-01T02:34:01Z

Another thing that really ought to be considered regardless of where CI happens: Currently it appears that benchmarks are the largest contributor to CI build time. However, these really don't need to be run on every platform. The sorts of performance regressions that haskell-language-server will feel are almost certainly platform independent.

not sure about Linux and macos but performance in windows could deviate quite likely, even in hls, taking in account that we are quite close to ghc, no?

jneira · 2021-01-01T02:49:20Z

nvm, windows is the one that could deviate the most but it is the unique without benchmarks 🤔
maybe we could drop macos benchs

pepeiborra · 2021-01-01T09:18:37Z

A major problem with GitHub actions is that they have no native support to cancel redundant builds

https://github.heygears.community/t/github-actions-cancel-redundant-builds-not-solved/16025

Right now, 95% of the jobs in the action queue are redundant, since they refer to commits that are below the tip of their branch, or since the branch is out of date with master.

There is a bunch of action providers that claim to solve this problem, so I think we need to pick and adopt one ASAP

pepeiborra · 2021-01-04T07:56:21Z

Summary of actions I have taken during the last couple of days:

Halve the number of Github action workflows from 33 to 17
Streamline CircleCI jobs by removing the Cabal task and skipping testing that is already covered in the GitHub Actions
Enable auto-cancel of redundant jobs in CircleCI and trigger only on PRs

The attempts to enable auto-cancel of redundant jobs in GitHub Actions have been unsuccessful. Since it is not a core feature, it must be scripted in an action provider. But those scripts need a GITHUB_TOKEN with write access in order to be able to cancel past jobs, and the issue is that PRs from forked repos do not get such write access, whereas PRs from local branches do. This can be changed at the org level /cc @jaspervdj but it raises security concerns.

It's likely that our Github CI will continue to be overwhelmed until we manage to find a solution for redundant builds

pepeiborra · 2021-01-04T09:36:38Z

Things left to do, as I have ran out of time to work on this myself:

Fix the GitHub Actions cache on Windows. It looks like we run over the limit of 5GB relatively easily, so keep this in mind
Find a solution for auto-canceling redundant builds or move away from GitHub Actions. Both CircleCI and Azure workflows support this as a core feature

jneira · 2021-01-04T09:44:45Z

Fix the GitHub Actions cache on Windows. It looks like we run over the limit of 5GB relatively easily, so keep this in mind

I am investigating how to fix windows cache, testing only last minor versions and restore macos builds (hopefully the latter will give us more time than the former will cost): master...jneira:test-last-versions
Alternatively we can only enable builds for macos.

hazelweakly · 2021-01-05T16:29:27Z

I like to joke that the most effective way to improve performance in CI is to either do less work or do less work. What I mean by that is you can either run less or run it less frequently. Doing both will be good for the long term.

I looked through all of the workflows and I have a few thoughts on how to accomplish both, listed in order of bang-for-buck in time:

Run less stuff. Currently everything runs on every single commit of every PR out there.
- The first change I would suggest is to switch all of the benchmarking to manual trigger instead. People can trigger benchmarks when they want (ie when they have working code to benchmark) and it vastly reduces the amount of churn that goes through the cache.
- The nix workflow can probably be ran much less often than it is.
- The second would be to only build a few versions of the LSP until "ready". Maybe as little as just the latest, maybe the latest 3 major version, maybe just the latest on windows/macos and 3 versions on Linux? Lots of wiggle room there. There's a surprising amount of code that only ends up needing to be (initially) tested on a single platform in practice.
Every single workflow tests everything. Which doesn't scale well as you can tell :)

There are 13 packages in cabal.project and in an ideal world each one has a separate set of tests, benchmarks, etc., and the minimal subset is selected to run every time. In a really ideal world, caching everything and letting cabal do the thing would give you this property; however, git + transient caches + VMs + etc... just makes it way too hard for a build system to reliably do change detection like that. Even teams using build systems like Bazel (designed for this exact purpose) end up having to do partial targets on CI systems.
- Another great change to think about is only building/testing parts that you need to. The granularity doesn't have to be very high here, but the payoff can be great. Drop dependency on shake for install.hs #63 is a great example of a PR that needs nothing else to run in its CI. Plugins are another one that could be an easy win (if that makes sense for how PRs are usually scoped).

More extreme savings are possible by switching the build system to something like Bazel and investing in dedicated CI infrastructure that can git checkout into bare metal with no state cleared between runs. Git fetches get way faster, caches don't have to be shoved into/out of the cloud, and the machines are much faster. I don't really recommend this approach because it makes the tooling way less approachable for contributors and you're going to end up having to do all of the above anyway, so you might as well milk the mileage out of the low hanging fruit first before considering something like Bazel.

pepeiborra · 2021-01-05T19:35:59Z

Thanks Jared, for your analysis, I agree with the first point. I would say:

Keep one of the benchmark jobs. I do check the benchmark results out of habit and there is a task to automate the check
Keep one of the Nix jobs, it runs in <15 minutes and helps to keep the scripts working

On your second point, I'm not sure testing everything is the problem. The test workflow is fail-fast, so the whole matrix is cancelled as soon as anything fails. The test suites run in 20 minutes combined, which is not too bad.

The real problem imho is redundant builds, those can easily overwhelm the CI pipeline.

But all that said, the Mergify bot seems to be working wonders already

hazelweakly · 2021-01-06T07:40:19Z

Redundant builds are definitely a huge issue. We run into those quite a bit and I wish I had a better way to reliably control those. So getting those down is definitely a large win for usability, and Mergify is an excellent tool to help with that as well.

Even if the test suites run in 20 minutes combined, if that can be reduced from 20 minutes (on average) across, say, 10-20 jobs to sometimes as low as 5-10 minutes across 2-5 jobs for smaller PRs, that has a surprisingly large cascading effect on the throughput of the overall CI queue. At the very least, it's worth thinking about for environmental concerns if the developer UX ones aren't compelling enough on their own.

jneira · 2021-01-19T08:30:53Z

@jared-w many thanks for your insights
I think we already have reduced the work done to an acceptable set, with a good balance (for now) between coverage, valuable info and spent time.

So the other big point to be considered would be:

Every single workflow tests everything. Which doesn't scale well as you can tell :)

Another great change to think about is only building/testing parts that you need to. The granularity doesn't have to be very high here, but the payoff can be great. #63 is a great example of a PR that needs nothing else to run in its CI. Plugins are another one that could be an easy win (if that makes sense for how PRs are usually scoped).

We had separated ghcide and hls in the past and that, in addition to have the double of ci resources, made easier tests both separately (being hls depending on ghcide).
It would be great got the ci run tests for the components being effectively changed in a pr (and its dependant ones). But:

all plugins are being tested in the same test suite, a known point to make improvement: Test infrastructure for plugins #576
- but we could still run specific tests using tasty -pattern
we should setup a configuration able to detect what components changed and its dependencies to run only specific tests (to investigate)
- maybe a starting point could be manually use labels or commit keywords (like [ci-skip] but extended to mark what components do you want to test). The problem is it would be error prone

hazelweakly · 2021-01-20T20:57:46Z

One pattern that's been proven to work in other projects is to run a get-targets type of script that will analyze git metadata, perhaps PR name, or whatever else you want to use and have it output the relevant subset of stuff to run. That could take the form of cabal test -- -pattern $(./get-targets), or for target in $(./get-targets); do cabal test $target; done, or a lot of other options.

I'd lightly suggest get-targets be written in a very lightweight language; probably bash, or python. The last thing you want is to add a few minutes to CI time in an effort to save CI time, or run into annoying bootstrapping problems. (Although once the logic gets worked out, if it doesn't really ever change, curling a built binary from "somewhere" is not the worst solution)

jneira · 2021-10-04T05:56:27Z

Things has been alliviated:

we have separated ghcide tests runs from plugin ones (changes in plugin cancel the ghcide test suite)
from the 20 concurrent runs from the entire haskell org, we have 180 job runners now thanks to github.

However a tool described by @jared-w would be great ayways, to allow a more fine grained selection of tests to do (changes in one plugin should not trigger tests in the rest)

jneira · 2021-12-16T10:53:16Z

I think we can close this issue, ci performance is pretty reasonable nowadays with all changes done since this in gh workflow and circleci.
Thanks all for the suggestions!

alanz added type: support User support tickets, questions, help with setup etc. old_type: meta Planing and organizing other issues status: in discussion Not actionable, because discussion is still ongoing or there's no decision yet labels Dec 31, 2020

bgamari mentioned this issue Jan 1, 2021

Add ghc-8.10.3 support after merging ghcide repo #721

Merged

georgefst mentioned this issue Oct 15, 2021

Point to GitHub from Contributing.md #2275

Merged

jneira closed this as completed Dec 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: CI resources #780

Discussion: CI resources #780

alanz commented Dec 31, 2020

jneira commented Jan 1, 2021

bgamari commented Jan 1, 2021 •

edited

Loading

bgamari commented Jan 1, 2021

jneira commented Jan 1, 2021 •

edited

Loading

jneira commented Jan 1, 2021

jneira commented Jan 1, 2021

pepeiborra commented Jan 1, 2021

pepeiborra commented Jan 4, 2021

pepeiborra commented Jan 4, 2021

jneira commented Jan 4, 2021 •

edited

Loading

hazelweakly commented Jan 5, 2021

pepeiborra commented Jan 5, 2021

hazelweakly commented Jan 6, 2021

jneira commented Jan 19, 2021 •

edited

Loading

hazelweakly commented Jan 20, 2021

jneira commented Oct 4, 2021

jneira commented Dec 16, 2021

Discussion: CI resources #780

Discussion: CI resources #780

Comments

alanz commented Dec 31, 2020

jneira commented Jan 1, 2021

bgamari commented Jan 1, 2021 • edited Loading

bgamari commented Jan 1, 2021

jneira commented Jan 1, 2021 • edited Loading

jneira commented Jan 1, 2021

jneira commented Jan 1, 2021

pepeiborra commented Jan 1, 2021

pepeiborra commented Jan 4, 2021

pepeiborra commented Jan 4, 2021

jneira commented Jan 4, 2021 • edited Loading

hazelweakly commented Jan 5, 2021

pepeiborra commented Jan 5, 2021

hazelweakly commented Jan 6, 2021

jneira commented Jan 19, 2021 • edited Loading

hazelweakly commented Jan 20, 2021

jneira commented Oct 4, 2021

jneira commented Dec 16, 2021

bgamari commented Jan 1, 2021 •

edited

Loading

jneira commented Jan 1, 2021 •

edited

Loading

jneira commented Jan 4, 2021 •

edited

Loading

jneira commented Jan 19, 2021 •

edited

Loading