Greenboot does not fail when first error is found #143

dhensel-rh · 2024-07-10T12:30:11Z

This issue is observed in MicroShift project. When multiple start scripts are on the system, and an error is encountered in the first script, Greenboot does not immediately fail. Using default settings, Greenboot checks all of them before rebooting (in 4.16, this is 3 scripts 40_microshift_running_check.sh, 41_microshift_running_check_multus.sh, 50_microshift_running_check_olm.sh). This can add delay to the system rolling back to a known good state. This might be intended design, but it can add time before a rollback can occur.

On the first boot, each script can take up to 5 minutes to check. That is ~15 minutes. On the second boot, this increases to 10 minutes per script. That is ~30 minutes. On the third boot, each script takes up to 15 minutes.

Can this be optimized in some way ?

nullr0ute · 2024-07-10T13:21:59Z

I think the microshift scripts are part of microshift, the bug would need to be reported there as those are the scripts that are taking the time

dhensel-rh · 2024-07-10T14:51:00Z

I opened a ticket wit hMicroShift and discussed this with Gregory Giguashvili previously. He felt this was a Greenboot feature enhancement.

'Looking at https://github.com/fedora-iot/greenboot/blob/main/usr/libexec/greenboot/greenboot#L69, the greenboot check continues the loop until all scripts are run and only checks for the error condition at https://github.com/fedora-iot/greenboot/blob/main/usr/libexec/greenboot/greenboot#L78.`

nullr0ute · 2024-07-10T15:38:13Z

Can you provide a link to the microshift ticket, the details provided to date don't provide that context. Also links to where the mentioned scripts can be found.

dhensel-rh · 2024-07-10T15:45:59Z

Oh yes. i was not sure if you could view the ticket. Apologies

https://issues.redhat.com/browse/USHIFT-3165

path to MicroShift greenboot files

nullr0ute · 2024-07-10T15:59:19Z

Seems like it should be a RFE to have an option to either 1) run all tests 2) fail at the first failure. To quote the full comment:

`
Douglas Hensel , this is by design of the current greenboot implementation.

Looking at https://github.com/fedora-iot/greenboot/blob/main/usr/libexec/greenboot/greenboot#L69, the greenboot check continues the loop until all scripts are run and only checks for the error condition at https://github.com/fedora-iot/greenboot/blob/main/usr/libexec/greenboot/greenboot#L78.

May I suggest closing this JIRA issue and opening an upstream feature request at https://github.com/fedora-iot/greenboot/issues?
`

nullr0ute · 2024-07-10T16:00:58Z

It also looks like microshift is randomly copying/forking things, it would be useful if those could be actually sent back upstream rather than forking as it also makes it hard to know if the bug is upstream or in your fork :)

openshift/microshift@556a91a

dhensel-rh · 2024-07-11T11:51:52Z

@nullr0ute Do you need me to do anything to start an RFE ?

ggiguash · 2024-07-11T12:08:34Z

@nullr0ute , let me comment on the issues you're raising.

Seems like it should be a RFE to have an option to either 1) run all tests 2) fail at the first failure.

The comment you're referring to in the JIRA ticket was our best guess on the current greenboot implementation. From the experience we have, I cannot think of a use-case when I would want to continue running greenboot scripts following a failure. It just delays the inevitable.
Nevertheless, if we want to keep the current default behavior, we should add an option to change it.
Can we consider this issue as an RFE for the requested functionality?

microshift is randomly copying/forking things: openshift/microshift@556a91a

Note that the commit you're referring to is not a copy / fork of any greenboot upstream functionality. It's a bug fix in MicroShift internal scripts. These scripts are implementing MicroShift-specific health-check functionality and they cannot be used upstream in the generic greenboot code.

say-paul added enhancement New feature or request jira flow issues to jira labels Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Greenboot does not fail when first error is found #143

Greenboot does not fail when first error is found #143

dhensel-rh commented Jul 10, 2024

nullr0ute commented Jul 10, 2024

dhensel-rh commented Jul 10, 2024

nullr0ute commented Jul 10, 2024

dhensel-rh commented Jul 10, 2024

nullr0ute commented Jul 10, 2024

nullr0ute commented Jul 10, 2024

dhensel-rh commented Jul 11, 2024

ggiguash commented Jul 11, 2024 •

edited

Loading

Greenboot does not fail when first error is found #143

Greenboot does not fail when first error is found #143

Comments

dhensel-rh commented Jul 10, 2024

nullr0ute commented Jul 10, 2024

dhensel-rh commented Jul 10, 2024

nullr0ute commented Jul 10, 2024

dhensel-rh commented Jul 10, 2024

nullr0ute commented Jul 10, 2024

nullr0ute commented Jul 10, 2024

dhensel-rh commented Jul 11, 2024

ggiguash commented Jul 11, 2024 • edited Loading

ggiguash commented Jul 11, 2024 •

edited

Loading