Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

github-actions executable updated itself #187009

Closed
2 tasks
aanderse opened this issue Aug 16, 2022 · 28 comments
Closed
2 tasks

github-actions executable updated itself #187009

aanderse opened this issue Aug 16, 2022 · 28 comments
Labels
0.kind: bug Something is broken

Comments

@aanderse
Copy link
Member

Describe the bug

A clear and concise description of what the bug is.

I was running the services.github-runner service and the service managed to update itself and doesn't work anymore.

Despite our NixOS service passing the --disableupdate flag the service will attempt to update itself 30 days after any new release is out. See https://github.blog/changelog/2022-02-01-github-actions-self-hosted-runners-can-now-disable-automatic-updates/ for details.

TODO:

  • Document that all updates to the github-runner package should be backported, always, probably without exception (similar to how we always update youtube-dl and the policies we have that allow for this)
  • Add a warning/note to services.github-runner.enable (or some other suitable place) that users need to keep up to date with the latest software

Notify maintainers

@veehaitch @newAM

cc @winterqt because she's always so helpful and I appreciated her discussing this with me 😄

@aanderse aanderse added the 0.kind: bug Something is broken label Aug 16, 2022
@winterqt
Copy link
Member

For what it's worth: according to GitHub's documentation on the subject, and my perusing of the code, this behavior does not seem intentional, and the --disableupdate flag should work as we expect it to (and not bypassed after 30 days). My rationale for this is that the runner wouldn't include a whole "warn if runner is out-of-date" flow, if upstream wanted the flag to just be bypassed in that case.

@aanderse Can you reliably reproduce this? If so, we should probably file a bug upstream.

@aanderse
Copy link
Member Author

I'm controlling versions with flakes so in theory I have everything I need to easily create a reproducible example. I'll try to find some time to do this.

@yayayayaka
Copy link
Member

yayayayaka commented Aug 16, 2022

I coincidentally just updated github-runner on release-22.05:

Aug 16 19:29:07 build-worker-03 systemd[1]: Starting GitHub Actions runner...
Aug 16 19:29:07 build-worker-03 m9qqfnqdhn1cpp7vxqs24dm4azgcym2v-github-runner-unconfigure.sh[6855]: Config has changed, removing old runner state.
Aug 16 19:29:07 build-worker-03 m9qqfnqdhn1cpp7vxqs24dm4azgcym2v-github-runner-unconfigure.sh[6855]: The old runner will still appear in the GitHub Actions UI. You have to remove it manually.
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6861]: Configuring GitHub Actions Runner
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6890]: touch: cannot touch '.env': Read-only file system
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6864]: ./env.sh: line 37: .path: Read-only file system
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6864]: ./env.sh: line 32: .env: Read-only file system
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6864]: ./env.sh: line 32: .env: Read-only file system
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6891]: --------------------------------------------------------------------------------
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6891]: |        ____ _ _   _   _       _          _        _   _                      |
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6891]: |       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6891]: |      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6891]: |      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6891]: |       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6891]: |                                                                              |
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6891]: |                       Self-hosted runner registration                        |
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6891]: |                                                                              |
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6891]: --------------------------------------------------------------------------------
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6891]: # Authentication
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6891]: Http response code: NotFound from 'POST https://api.github.com/actions/runner-registration'
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6891]: {"message":"Not Found","documentation_url":"https://docs.github.com/rest"}
Aug 16 19:29:07 build-worker-03 zgxrnwpav3figy05rkhggyyr5bykl220-github-runner-configure.sh[6891]: Response status code does not indicate success: 404 (Not Found).
Aug 16 19:29:07 build-worker-03 systemd[1]: github-runner.service: Control process exited, code=exited, status=1/FAILURE
Aug 16 19:29:07 build-worker-03 systemd[1]: github-runner.service: Failed with result 'exit-code'.
Aug 16 19:29:07 build-worker-03 systemd[1]: Failed to start GitHub Actions runner.
Aug 16 19:29:07 build-worker-03 systemd[1]: github-runner.service: Consumed 441ms CPU time, received 4.3K IP traffic, sent 1.0K IP traffic.

Is this related to this bug or rather a candidate for a separate bug report?

@winterqt
Copy link
Member

Looks like this is an unrelated bug, @yayayayaka -- unless your runner self-updated beforehand, do your logs show that?

@yayayayaka
Copy link
Member

Oh, indeed it did. I was just not looking back enough in the journal.

github-runner-log.txt

@newAM
Copy link
Member

newAM commented Aug 17, 2022

Is this related to this bug or rather a candidate for a separate bug report?

Separate bug. I saw this occur here: #182189 (review)

But I was not able to reproduce it when I dug into it a few days later. I assumed it was a PBKAC with the way I tested it the first time 😥

Edit: just read the rest of the comments, maybe not a separate bug then? I'll dig in more later

@veehaitch
Copy link
Member

I think it is a good guideline to always backport. We could also think about applying the patch again that I used before the --disableupdate flag was introduced upstream.

@winterqt
Copy link
Member

I think it is a good guideline to always backport.

I think we can all agree on this, given the nature of the package.

I think we should look more into this, and inform upstream at some point, falling back to that patch as a last resort if needed. I believe this is the best course of action because faking the version data to GitHub's servers is probably not the best idea given how the program works, so we should definitely strive to not do that if we don't have to. For example, if GitHub conditionally sends something based on the version of the runner, our nonexistent high version could cause invalid data to be sent to an older version. (Even though upstream warns against not updating for long periods of time, this is definitely still something to consider.)

(Hopefully that made sense 😅)

@veehaitch
Copy link
Member

Definitely makes sense! This is why I removed the patch as soon as the --disableupdate flag came out.

@newAM
Copy link
Member

newAM commented Aug 18, 2022

Is this related to this bug or rather a candidate for a separate bug report?

Separate bug. I saw this occur here: #182189 (review)

But I was not able to reproduce it when I dug into it a few days later. I assumed it was a PBKAC with the way I tested it the first time disappointed_relieved

Edit: just read the rest of the comments, maybe not a separate bug then? I'll dig in more later

Turns out this is a separate issue, or this symptom can appear as a result of more than this self-updating issue. I experienced this again after updating nix, which included the latest github-runner package updates. My runner logs show no signs of attempting to self-update. Strangely this was only experienced with my x86_64-linux runners; my aarch64-linux runners have no problems, even with identical service configuration.

I'll file an issue for this when I have time to describe it more intelligently then "it no work", if anyone else has time to triage before then please file an issue 😄

@aanderse
Copy link
Member Author

aanderse commented Sep 5, 2022

As mentioned in #189203 (comment) I continue to run into issues where github-runner attempts to update itself and refuses to run because it cannot. Is anyone else having these issues?

@newAM
Copy link
Member

newAM commented Sep 5, 2022

We could also think about applying the patch again that I used before the --disableupdate flag was introduced upstream.

I do not have any issues with my runners, but I also have this patch in my NixOS overlays. I think it would be a good idea to reapply this patch.

Another possible solution is to let this run in an FHS environment and update itself.

@aanderse
Copy link
Member Author

aanderse commented Sep 5, 2022

but I also have this patch in my NixOS overlays.

When did you put this patch in your overlays? Do you have automation to keep up with upstream releases in your local deploys, or you just keep an eye out and manually update ASAP?

Another possible solution is to let this run in an FHS environment and update itself.

Or run Ubuntu, but neither of those options sounds appealing 😉

@newAM
Copy link
Member

newAM commented Sep 5, 2022

When did you put this patch in your overlays? Do you have automation to keep up with upstream releases in your local deploys, or you just keep an eye out and manually update ASAP?

I added it when @veehaitch made the comment, seemed like a good idea to keep things working.

I do also have automation to update the NixOS flake for all my systems.

@aanderse
Copy link
Member Author

aanderse commented Sep 5, 2022

but I also have this patch in my NixOS overlays.

I realize I completely misunderstood what you meant by this. You meant the --disableupdate related patches.

I think it would be a good idea to reapply this patch.

I agree. Are you able to make a PR? We could use that to resolve this issue.

@newAM
Copy link
Member

newAM commented Sep 5, 2022

Opened a PR at #189909

@winterqt
Copy link
Member

winterqt commented Sep 5, 2022

I still think we should try to open an issue upstream, as this seems like a bug they'd want to address. Having this patch applied longterm is probably not the best idea, so we should see what upstream thinks.

@winterqt
Copy link
Member

winterqt commented Oct 9, 2022

Does someone want to open an issue, or should I?

@mkaito
Copy link
Contributor

mkaito commented Oct 15, 2022

Is this what we want? actions/runner#2201

@mkaito
Copy link
Contributor

mkaito commented Oct 15, 2022

Either way, updates for the package stopped being backported to 22.05 due to merge conflicts. Combine that with this option being ignored, and GHA is basically completely broken atm in stable.

@winterqt
Copy link
Member

Is this what we want? actions/runner#2201

Sadly, no -- we call config.sh directly.

I'm going to go ahead and open an issue upstream for this myself (in a few hours from now).

@aanderse
Copy link
Member Author

Sorry for the late reply. Thank you so much @winterqt!

@domenkozar
Copy link
Member

Is this fixed?

@aanderse
Copy link
Member Author

I'm not sure... I setup automatic updates to run daily so I can't tell.

@Profpatsch
Copy link
Member

We are still getting this every time the config changes and the script tries to re-authenticate:

unconfigure.sh[2728540]: Config has changed, removing old runner state.
unconfigure.sh[2728540]: The old runner will still appear in the GitHub Actions UI. You have to remove it manually.
configure.sh[2728547]: Configuring GitHub Actions Runner
configure.sh[2728578]: --------------------------------------------------------------------------------
configure.sh[2728578]: |        ____ _ _   _   _       _          _        _   _                      |
configure.sh[2728578]: |       / ___(_) |_| | | |_   _| |__      / \   ___| |_(_) ___  _ __  ___      |
configure.sh[2728578]: |      | |  _| | __| |_| | | | | '_ \    / _ \ / __| __| |/ _ \| '_ \/ __|     |
configure.sh[2728578]: |      | |_| | | |_|  _  | |_| | |_) |  / ___ \ (__| |_| | (_) | | | \__ \     |
configure.sh[2728578]: |       \____|_|\__|_| |_|\__,_|_.__/  /_/   \_\___|\__|_|\___/|_| |_|___/     |
configure.sh[2728578]: |                                                                              |
configure.sh[2728578]: |                       Self-hosted runner registration                        |
configure.sh[2728578]: |                                                                              |
configure.sh[2728578]: --------------------------------------------------------------------------------
configure.sh[2728578]: # Authentication
configure.sh[2728578]: Http response code: NotFound from 'POST https://api.github.com/actions/runner-registration' (Request Id: AD52:129AD:34AAA82:3594C4E:63F669>
configure.sh[2728578]: {"message":"Not Found","documentation_url":"https://docs.github.com/rest"}
configure.sh[2728578]: Response status code does not indicate success: 404 (Not Found).

@veehaitch
Copy link
Member

@Profpatsch maybe related to #217721

@Profpatsch
Copy link
Member

@veehaitch I think we figured out what caused this, the short time runner registration tokens are active; when switching to fine-grained PATs, this is fixed.

I created a PR which improves the documentation of the module in this regard: #217827

@Profpatsch
Copy link
Member

Regarding the original issue brought up in this thread, we’ve been running the runner for over a year, and it has never tried to update itself in that time.

So I would close this, feel free to reopen.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
0.kind: bug Something is broken
Projects
None yet
Development

No branches or pull requests

8 participants