Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Greenboot should notify users of abrupt power failure #103

Open
say-paul opened this issue Jun 9, 2023 · 11 comments
Open

Greenboot should notify users of abrupt power failure #103

say-paul opened this issue Jun 9, 2023 · 11 comments

Comments

@say-paul
Copy link
Member

say-paul commented Jun 9, 2023

Greenboot should notify user(MOTD) that there is an abrupt boot cycle detected in-case of power loss in device or force killing of a VM during the next reboot.
As there can be things/service that dependent on shutdown targets didn't get executed correctly which may cause issue in the next boot.
example: rpm-ostree update the staged update gets lost once there is sudden power failure.

@nullr0ute
Copy link
Member

How would you suggest it detects this?

@say-paul
Copy link
Member Author

say-paul commented Jun 9, 2023

I would think of parsing though journald and check for shutdown.target/reboot.target or some related target to validate if a system has been shutdown gracefully.

@cgwalters
Copy link
Contributor

We've had issues with trying to use the journal in our "control loops"; a very pertinent one related to this is that rhel8's journalctl will fail to parse journals generated from rhel9 (and historically the MCO did exactly this with journalctl).

See e.g. openshift/os#1271

Today rpm-ostree actually also does exactly this to detect if ostree-finalize-staged failed and there's a whole rpm-ostree history stuff...see https://github.com/coreos/rpm-ostree/blob/main/rust/src/journal.rs

That said recently I did ostreedev/ostree#2589 which is a bit related here. Arguably indeed we could extend things with a similar model where we persist "attempt to reboot with pending changes to apply" in a persistent non-journal place.

@cgwalters
Copy link
Contributor

We also need to support systems that don't use a persistent journal. So in general if it's critical, then it can't be in the journal and needs to be external to it. You're arguing for something informative which could be in the journal, but it still gets tricky for the above reasons.

@cgwalters
Copy link
Contributor

Here's a strawman proposal; what if we just merged the greenboot code as is into github.com/coreos/rpm-ostree ?

We'd make it a new subpackage; the RPM-level transition could either be that we start generating subpackages literally named the same things (possible AFAIK) or we make a new rpm-ostree-greenboot that Obsoletes: greenboot. But all the binaries, config files and services would remain the same.

@say-paul
Copy link
Member Author

For

We also need to support systems that don't use a persistent journal

I know we did something for rhel9/8 to enable this: osbuild/osbuild-composer#3118, I guess we can do that for fedora too.

very pertinent one related to this is that rhel8's journalctl will fail to parse journals generated from rhel9

I want to understand how integrating greenboot in rpm-ostree will solve the above problem.

@runcom
Copy link
Member

runcom commented Jun 12, 2023

I know we did something for rhel9/8 to enable this: osbuild/osbuild-composer#3118, I guess we can do that for fedora too.

@say-paul I think Colin refers to operating system w/o journal altogether, not enabling persistency there..

Here's a strawman proposal; what if we just merged the greenboot code as is into github.com/coreos/rpm-ostree ?

having worked on MCO and the journald thing, I agree it's not ideal and we can't use it, we'd definitely need something more robust... @cgwalters not sure maybe I've missed it, after merging it in rpm-ostree, would the plan be to better integrate it with rpm-ostree?

@cgwalters
Copy link
Contributor

@cgwalters not sure maybe I've missed it, after merging it in rpm-ostree, would the plan be to better integrate it with rpm-ostree?

My thoughts on this primarily to start are at the very practical level:

  • Do either of you watch activity (PRs and issues) for rpm-ostree? I suspect not. I haven't been watching greenboot activity (but I am now); by literally having them in the same codebase, that happens automatically
  • CI: It looks to me like there is no CI on this repository that does integration testing; that's something we've invested heavily in in the coreos/ org
  • Rust infrastructure too: As https://github.com/fedora-iot/greenboot/commits/greenboot-rs progresses, there's a ton of "overhead and maintenance" stuff for Rust projects that we've invested in (to start, things like dependabot handling, consistent CI checks and MSRV handling, integration with https://github.com/coreos/cargo-vendor-filterer )
  • Aligning releases

Beyond the "infrastructure" level, I really want to integrate greenboot state into rpm-ostree status. I think I mentioned this elsewhere but for https://github.com/coreos/zincati/ we did a ton of work to add this "driver registration" interface basically just so that when you type rpm-ostree upgrade it tells you "no, updates are driven by zincati".

For greenboot, I think the basic integration here would be showing when the current boot was the target of an automated rollback - and surfacing that in a consistent way via the same rpm-ostree status --json and/or DBus API.

@runcom
Copy link
Member

runcom commented Jun 12, 2023

Beyond the "infrastructure" level, I really want to integrate greenboot state into rpm-ostree status

that would indeed be ideal, and I think we had this discussion elsewhere too, maybe in the future we could integrate other greenboot's functionality into rpm-ostree too (boot stuff mainly I think to remember).

Infrastructure wise, yeah, our integration tests aren't wired here, osbuild-composer "drives" them and our QE team too, that's not ideal... We do not indeed watch rpm-ostree closely. I guess, I'm not against this, at all, I think it would be beneficial to keep advancing greenboot. There are still things to do correctly (somebody changes something in /etc breaking a greenboot check and no reboot happens, then a working upgrade come but the greenboot check fails because of an unrelated-to-the-upgrade issue). Let's see what others think too.

@miabbott
Copy link
Member

Beyond the "infrastructure" level, I really want to integrate greenboot state into rpm-ostree status

that would indeed be ideal, and I think we had this discussion elsewhere too, maybe in the future we could integrate other greenboot's functionality into rpm-ostree too (boot stuff mainly I think to remember).

We started the integration conversation in the context of ostree - ostreedev/ostree#2725

@cgwalters
Copy link
Contributor

We started the integration conversation in the context of ostree - ostreedev/ostree#2725

Ultimately having this in ostree does I think make the most sense, but at a practical level today the code is invoking rpm-ostree, and I was thinking of this as the "no code changes" move. We can still lower into ostree later.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants