Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Greenboot does not fail when first error is found #143

Open
dhensel-rh opened this issue Jul 10, 2024 · 8 comments
Open

Greenboot does not fail when first error is found #143

dhensel-rh opened this issue Jul 10, 2024 · 8 comments
Labels
enhancement New feature or request jira flow issues to jira

Comments

@dhensel-rh
Copy link

This issue is observed in MicroShift project. When multiple start scripts are on the system, and an error is encountered in the first script, Greenboot does not immediately fail. Using default settings, Greenboot checks all of them before rebooting (in 4.16, this is 3 scripts 40_microshift_running_check.sh, 41_microshift_running_check_multus.sh, 50_microshift_running_check_olm.sh). This can add delay to the system rolling back to a known good state. This might be intended design, but it can add time before a rollback can occur.

On the first boot, each script can take up to 5 minutes to check. That is ~15 minutes. On the second boot, this increases to 10 minutes per script. That is ~30 minutes. On the third boot, each script takes up to 15 minutes.

Can this be optimized in some way ?

@nullr0ute
Copy link
Member

I think the microshift scripts are part of microshift, the bug would need to be reported there as those are the scripts that are taking the time

@dhensel-rh
Copy link
Author

I opened a ticket wit hMicroShift and discussed this with Gregory Giguashvili previously. He felt this was a Greenboot feature enhancement.

'Looking at https://github.com/fedora-iot/greenboot/blob/main/usr/libexec/greenboot/greenboot#L69, the greenboot check continues the loop until all scripts are run and only checks for the error condition at https://github.com/fedora-iot/greenboot/blob/main/usr/libexec/greenboot/greenboot#L78.`

@nullr0ute
Copy link
Member

Can you provide a link to the microshift ticket, the details provided to date don't provide that context. Also links to where the mentioned scripts can be found.

@dhensel-rh
Copy link
Author

Oh yes. i was not sure if you could view the ticket. Apologies

https://issues.redhat.com/browse/USHIFT-3165

path to MicroShift greenboot files

@nullr0ute
Copy link
Member

Seems like it should be a RFE to have an option to either 1) run all tests 2) fail at the first failure. To quote the full comment:

`
Douglas Hensel , this is by design of the current greenboot implementation.

Looking at https://github.com/fedora-iot/greenboot/blob/main/usr/libexec/greenboot/greenboot#L69, the greenboot check continues the loop until all scripts are run and only checks for the error condition at https://github.com/fedora-iot/greenboot/blob/main/usr/libexec/greenboot/greenboot#L78.

May I suggest closing this JIRA issue and opening an upstream feature request at https://github.com/fedora-iot/greenboot/issues?
`

@nullr0ute
Copy link
Member

It also looks like microshift is randomly copying/forking things, it would be useful if those could be actually sent back upstream rather than forking as it also makes it hard to know if the bug is upstream or in your fork :)

openshift/microshift@556a91a

@dhensel-rh
Copy link
Author

@nullr0ute Do you need me to do anything to start an RFE ?

@ggiguash
Copy link

ggiguash commented Jul 11, 2024

@nullr0ute , let me comment on the issues you're raising.

Seems like it should be a RFE to have an option to either 1) run all tests 2) fail at the first failure.

The comment you're referring to in the JIRA ticket was our best guess on the current greenboot implementation. From the experience we have, I cannot think of a use-case when I would want to continue running greenboot scripts following a failure. It just delays the inevitable.
Nevertheless, if we want to keep the current default behavior, we should add an option to change it.
Can we consider this issue as an RFE for the requested functionality?

microshift is randomly copying/forking things: openshift/microshift@556a91a

Note that the commit you're referring to is not a copy / fork of any greenboot upstream functionality. It's a bug fix in MicroShift internal scripts. These scripts are implementing MicroShift-specific health-check functionality and they cannot be used upstream in the generic greenboot code.

@say-paul say-paul added enhancement New feature or request jira flow issues to jira labels Jul 31, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request jira flow issues to jira
Projects
None yet
Development

No branches or pull requests

4 participants