Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Flatcar machines stuck in Waiting... instead of pulling new release(s) #853

Open
tylerauerbeck opened this issue Oct 2, 2024 · 2 comments

Comments

@tylerauerbeck
Copy link

Description

After flipping pin to latest stable, a number of machines pulled down the latest download. We had paused downloads reboots over the weekend and came back to begin again a few days later and now we have a number of machines that are stuck in Current Status of Waiting.... When looking at update_engine logs the only mention I see is omaha_request_action.cc:629] HTTP reported success but Omaha reports an error.. When lining this up for a similar error in the Nebraska logs matching that machineID, the only log I see is update complete.error. Is there any additional logging I can turn up to determine the actual root of this problem. I've tried things like restarting update_engine, Nebraska, etc. to see if I can get things unstuck without any luck.

Impact

Further downloads are not occurring and current_status is not accurately reflecting the status of this rollout.

Environment and steps to reproduce

  1. Set-up: Nebraska 2.9 attempting to roll out 3975.2.1
  2. Task: Flipped channel pin to 3975.2.1 to begin rollout
  3. Action(s): Update pin to begin rollout, paused update of machines over a span of 2+ days (resulting in machines staying in the Downloaded state for a period of time prior to attempting to continue rollout
    a. [ requested the start of a new pod or container ]
    b. [ container image downloaded ]
  4. Error: [describe the error that was triggered]
  • omaha_request_action.cc:629] HTTP reported success but Omaha reports an error.
  • update complete.error

Expected behavior

Nebraska accurately reflecting current status and additional nodes continuing to download new release

Additional information

N/A

@tylerauerbeck
Copy link
Author

If it helps at all, the nodes in Waiting... seem to bunch up under On Hold in the dashboard for the channel.

@ErvinRacz
Copy link
Contributor

ErvinRacz commented Jan 16, 2025

Thank you for taking the time to report the issue, @tylerauerbeck!

I noticed it was reported a couple of months ago, and I wanted to check in to see if it's still relevant.

I just started to learn how the update server works a few weeks ago, but once had a similar experience when I was testing the update policy settings and reboot strategies - the status of nodes got stuck in the same Waiting ... status until I realized that I turned off the automatic reboot strategy.

Following questions may help us to investigate what happened:

  • Was there any nodes that have successfully got updated before the "pause"?
  • What do you mean by pause? Turning off reboots or disabling updates from Nebraska UI?
  • The reboot strategy can be checked in the following files:
/usr/share/flatcar/update.conf
/etc/flatcar/update.conf <--- overwrite the previous file's settings
  • update_engine_client -status is helpful the check the status of the update process on an individual node
  • What was the update policy set in Nebraska for the group?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
Status: 📝 Needs Triage
Development

No branches or pull requests

2 participants