MCO-1504: Update bootimage management enhancement #1761

djoshy · 2025-02-26T21:04:48Z

This is a follow-up update for #1496 and proposes a strategy for implementing an opt-out and skew enforcement mechanism for boot image updates. A lot of this is based on #1698 by @yuqi-zhang - Thanks, Jerry!

All comments and questions welcome. I have a few open questions for which I'll be leaving comments below.

cc @jlebon @wking

And sorta unrelated: I've also move some of the older flowcharts to mermaid diagrams as they are more maintainable.

openshift-ci-robot · 2025-02-26T21:04:52Z

@djoshy: This pull request references MCO-1504 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set.

In response to this:

This is a follow-up update for #1496 and proposes a strategy for implementing an opt-out and skew enforcement mechanism for boot image updates. A lot of this is based on #1698 by @yuqi-zhang - Thanks, Jerry!

All comments and questions welcome. I have a few open questions for which I'll be leaving comments below.

cc @jlebon @wking

And sorta unrelated: I've also move some of the older flowcharts to mermaid diagrams as they are more maintainable.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2025-02-26T21:05:04Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jaypoulz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

djoshy · 2025-02-26T21:15:24Z

enhancements/machine-config/manage-boot-images.md

+  - For machineset backed clusters, this would be updated by the MSBIC after it succesfully updates boot images. 
+  - For non-machineset backed clusters, this would be updated by the cluster admin to indicate the last manually updated bootimage. The cluster admin would need to update this configmap every few releases, when the RHEL minor on which the RHCOS container is built on changes (e.g. 9.6->9.8). 
+
+The cluster admin may also choose to opt-out of skew management via this configmap, which indicates that they will not require scaling nodes, and thereby opting out of skew enforcement and scaling functionality.


I thought of a couple alternate routes for the opt-out here:

Deleting the configmap. This may add complexity on the MCO to "book keep" the creation/deletion of the configmap. It might be safer to use a field within the configmap to indicate opting out of the skew instead.

Add a new cluster level "skew-enforcement" knob within the ManagedBootImages API field. I think it is important to keep this separate from the knob that selects machine resources for boot image updates, as using a single control for the "opt-in" and "skew" mechanism may makes things a bit confusing.

Happy to hear other ideas too!

We have done one-off configmaps for some features during upgrade (cgroup default for example) but I think this has too many contact points to make that management straightforward. I'd lean towards making it an explicit API field (or I guess annotation, like the opt out)

On the general approach, I think the Proactive approach is easier to maintain, albeit maybe annoying for some users who have to ack every few releases. But then again if they don't want to scale at all, they can just turn skew enforcement off (do we stop them from scaling altogether? or try on a best effort basis then?)

+1 on doing this via API as well.

maybe annoying for some users who have to ack every few releases.

Could you clarify this? My understanding was that if a cluster has opted out of skew enforcement, they wouldn't have to do that again. From the MCO POV, this means that:

We no longer proactively degrade the cluster if the boot images are out of date.

If they attempt scaling after that, and the skew is large enough, either of the reactive approaches should cover this scenario.

+1 lets avoid configmaps, they don't age well

enhancements/machine-config/manage-boot-images.md

djoshy · 2025-02-26T21:21:25Z

enhancements/machine-config/manage-boot-images.md

+
+#### Reactive
+1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled.
+2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far.


I left this in from #1698, in case I was missing something. How would the daemon know the "acceptable" skew during firstboot? I think we could potentially do this after the pivot and yell at the admin, but IMO the "reject join" approach would probably cover this case and never let the firstboot daemon get to pivot.

I guess we'd have to inject that information into the payload.

Also this would cover cases where the environment doesn't use the MCS

Ahh I might have misunderstood something here then, does the first boot daemon have access to the release payload? I thought all it had was the target MachineConfig when it does the first boot pivot.

Possibly Jerry meant "inject that information into the Ignition config"?

djoshy · 2025-02-26T21:26:55Z

enhancements/machine-config/manage-boot-images.md

+2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far.
+
+RHEL major versions will no longer be cross-compatible. i.e. if you wish to have a RHEL10 machineconfigpool, you must use a RHEL10 bootimage.
+


cc @wking @yuqi-zhang
picking up the converstation from #1698:

From this point on, MCO will target RHEL 10 for new nodes scaling into this MC

I'll let Jerry weigh in here, but my read here was that we aren't planning on doing any MCP specific enforcement. I think Jerry was implying this would be result from the aforementioned enforcement methods.

Hmm, so, when initially discussing around RHEL 10, it was around dual-stream, where you'd simultaneously have rhel9 and rhel10 based workers, and each type would have to boot from the same origin major for the bootimage. I think the original intention was to reduce potential 9->10 upgrade issues until RHEL 10 is more stable, but I could be wrong there (cc @sdodson )

When transitioning the cluster base RHCOS nodes from 9->10 then it would be a different problem. I think we'd have to have some cross compatibility there eventually and allow for rhel9 bootimages to work for at least 1 version where the shipped image is RHEL10

enhancements/machine-config/manage-boot-images.md

yuqi-zhang · 2025-02-27T22:36:25Z

enhancements/machine-config/manage-boot-images.md

+  - For machineset backed clusters, this would be updated by the MSBIC after it succesfully updates boot images. 
+  - For non-machineset backed clusters, this would be updated by the cluster admin to indicate the last manually updated bootimage. The cluster admin would need to update this configmap every few releases, when the RHEL minor on which the RHCOS container is built on changes (e.g. 9.6->9.8). 
+
+The cluster admin may also choose to opt-out of skew management via this configmap, which indicates that they will not require scaling nodes, and thereby opting out of skew enforcement and scaling functionality.


We have done one-off configmaps for some features during upgrade (cgroup default for example) but I think this has too many contact points to make that management straightforward. I'd lean towards making it an explicit API field (or I guess annotation, like the opt out)

On the general approach, I think the Proactive approach is easier to maintain, albeit maybe annoying for some users who have to ack every few releases. But then again if they don't want to scale at all, they can just turn skew enforcement off (do we stop them from scaling altogether? or try on a best effort basis then?)

enhancements/machine-config/manage-boot-images.md

yuqi-zhang · 2025-02-27T22:39:58Z

enhancements/machine-config/manage-boot-images.md

+
+#### Reactive
+1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled.
+2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far.


I guess we'd have to inject that information into the payload.

Also this would cover cases where the environment doesn't use the MCS

yuqi-zhang · 2025-02-27T22:43:44Z

enhancements/machine-config/manage-boot-images.md

+2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far.
+
+RHEL major versions will no longer be cross-compatible. i.e. if you wish to have a RHEL10 machineconfigpool, you must use a RHEL10 bootimage.
+


Hmm, so, when initially discussing around RHEL 10, it was around dual-stream, where you'd simultaneously have rhel9 and rhel10 based workers, and each type would have to boot from the same origin major for the bootimage. I think the original intention was to reduce potential 9->10 upgrade issues until RHEL 10 is more stable, but I could be wrong there (cc @sdodson )

When transitioning the cluster base RHCOS nodes from 9->10 then it would be a different problem. I think we'd have to have some cross compatibility there eventually and allow for rhel9 bootimages to work for at least 1 version where the shipped image is RHEL10

enhancements/machine-config/manage-boot-images.md

djoshy · 2025-03-04T15:53:57Z

In the last push:

Timeline has been updated to link to release docs.
Clarified decoupling of default-on and skew enforcement.
Added a new "None" mode for machine resource selection to the existing MachineManagerSelectorMode API.
Proposed a new API for skew enforcement selection.

JoelSpeed · 2025-03-07T10:33:10Z

enhancements/machine-config/manage-boot-images.md


 This work will be tracked in [MCO-793](https://issues.redhat.com/browse/MCO-793).

+##### Projected timeline
+
+This is a tentative timeline, subject to change (GA = General Availability, TP = Tech Preview, DEF = Default-on).


What's the difference between GA and DEF?

So, GA is opt-in, and DEF is opt-out, ideally a release later. I'll try to clarify that GA is still opt-in here 👍

How will you make a decision that a platform is sufficiently well tested to be able to go default on? Is there any feedback or data you can gather to show everything is working as you'd expect it to?

So, realistically speaking, there's a limited amount of test coverage for scenarios like this, since this will be more relevant for longer-lived clusters (upgraded through multiple releases, and have customizations). That isn't something we generally test within CI (there's some tests that install really old versions in QE suite and upgrade through the versions of OCP, but not ran very often as far as I'm aware).

One thing we would like to add for more signal is some metrics, and splitting up GA and DEF gives us more time to gather some data there for potential error cases (really the biggest potential problems will be around user ignition customization).

This distinction can also be blurred for later platforms (i.e. have GA and DEF be the same step). For current sets of platforms we wanted to have some more soak time for tests and CI jobs that we do have.

enhancements/machine-config/manage-boot-images.md

JoelSpeed · 2025-03-07T10:36:50Z

enhancements/machine-config/manage-boot-images.md

+  // skewEnforcement allows an admin to set behavior of the boot image skew enforcement mechanism.
+  // Enabled means that the MCO will degrade and prevent upgrades when the boot image skew is too large.
+  // Disabled means that the MCO will no longer degrade and will permit upgrades when the boot image skew is 
+  // too large. This may also hinder the cluster's scaling ability.


Define too large? What are the potential pitfalls of "too large" of a skew?

By too large, I meant to say that it fails the skew guidance from the release image - I'll clarify the godoc to better desribe this. The main pitfall is that scaling would most likely fail, i.e. the pivot to release OS image isn't possible if your current boot image is below x. If scaling is a non-issue for the cluster in question, they could disable it and the cluster would be able to carry out upgrades again.

JoelSpeed · 2025-03-07T10:37:19Z

enhancements/machine-config/manage-boot-images.md

+  // Disabled means that the MCO will no longer degrade and will permit upgrades when the boot image skew is 
+  // too large. This may also hinder the cluster's scaling ability.
+  // +optional
+  SkewEnforcement SkewEnforcementSelectorMode `json:"skewEnforcement"`


If you were to make the enum values here represent the actual skew of the images, what might this look like?

Do you mean that we should make the skew configurable? My understanding was that it needed to be something constant for a release(defined in the releaseImage) and it could potentially change between releases, but not something an operator/admin would get to manually set.

I more mean, you have skew enforcement as Enabled or Disabled. What if skew enforcement were more like ReleaseRecommended and Disabled, would that make more sense, and allow for a future expansion where an admin could opt in and say, actually, I want SingleRelease skew, or DualRelease skew? Allowing them to set their own guidelines and override what is recommended by the release image itself

Hmm, what's the use case you're thinking about? I think the most likely scenario is that they want to override the skew check in the release image because they don't care about scaling. I'm not sure if they'd be interested in making the skew check tighter than we require.

enhancements/machine-config/manage-boot-images.md

JoelSpeed · 2025-03-07T10:47:19Z

enhancements/machine-config/manage-boot-images.md

+Introduce a new configmap in the MCO namespace that will store the last updated boot image and allows for easy comparison against the
+skew policy described in the release payload.
+  - For machineset backed clusters, this would be updated by the MSBIC after it succesfully updates boot images. 
+  - For non-machineset backed clusters, this would be updated by the cluster admin to indicate the last manually updated bootimage. The cluster admin would need to update this configmap every few releases, when the RHEL minor on which the RHCOS container is built on changes (e.g. 9.6->9.8). 


Why a configmap and not an actual API?

I think I gravitated towards a configmap for this because we could potentially reuse the existing coreos-bootimages(golden) configmap, and define a new "current cluster boot image" field within it.

cc @yuqi-zhang incase there was some other goal here, this was my thought 😄

No strong preference here. Like David said, we already have configmaps in place. We could leverage an API field if we wanted to validate at the API level, but maybe not necessary for this case.

You're using the configmap as a proxy API, except configmaps are poor-mans APIs since they lack any structure or validation, I'd strongly recommend moving away from admins having to set specific values in configmaps, and create an actual API for this

JoelSpeed · 2025-03-07T10:47:52Z

enhancements/machine-config/manage-boot-images.md

+  - For machineset backed clusters, this would be updated by the MSBIC after it succesfully updates boot images. 
+  - For non-machineset backed clusters, this would be updated by the cluster admin to indicate the last manually updated bootimage. The cluster admin would need to update this configmap every few releases, when the RHEL minor on which the RHCOS container is built on changes (e.g. 9.6->9.8). 
+
+The cluster admin may also choose to opt-out of skew management via this configmap, which indicates that they will not require scaling nodes, and thereby opting out of skew enforcement and scaling functionality.


+1 lets avoid configmaps, they don't age well

enhancements/machine-config/manage-boot-images.md

JoelSpeed · 2025-03-07T10:51:17Z

enhancements/machine-config/manage-boot-images.md

+- Opt-out of skew enforcement altogether, giving up scaling ability.
+
+#### Reactive
+1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled.


As someone who maintains the Machine API/Cluster API components, and would have to deal with the customers complaining that their machines can't scale up, I'm a hard no to this idea.

Ignition failures are hard to diagnose already and we are constantly triaging them already as people assume they are a failure in our ability to provision instances

I think this was proposed because pivoting was going to fail anyway and this was a way of warning the user. cc @yuqi-zhang if I'm missing something here!

There's 2 intents behind this idea:

we don't force users to be up to date if they don't want scaling, so this is mostly a fallback error that would hopefully not be hit. If instead we want to say "we always require proactive user action", then we wouldn't need this fallback error

the MCS failure will bubble up via the MCO's CO object so the MCO actually degrades alongside no nodes joining the cluster, instead of the "stuck in provisioned state" we have today (which the MCO would not surface), essentially loudly failing before we even get to the ignition stage

If it's incredibly obvious, eg from MCS logs as to why it is not serving the ignition, then debugging this may become easier, but generally any "I scaled up and my node didn't join the cluster" issue goes to the cluster infra team and this behaviour sounds like it'll make this more common. I'd be keen to make sure we do all we can to avoid more noise for the cluster infra team.

A variation on the MCS rejection idea is to just serve an Ignition config that writes e.g. an /etc/issue with a message explaining that the bootimage is too old. But yeah, the MCS should also surface this up on the cluster-side so it's not only visible from the node's console.

JoelSpeed · 2025-03-07T10:51:47Z

enhancements/machine-config/manage-boot-images.md

+1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled.
+2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far.
+
+RHEL major versions will no longer be cross-compatible. i.e. if you wish to have a RHEL10 machineconfigpool, you must use a RHEL10 bootimage.


This feels like a breaking change, why now?

I understand there's lots changing about our boot images, but, is this a one off, or a constant issue going forward?

This specifically is for dual-stream support, where in some version of OCP (likely 4.20?) that we will have a special RHEL-10 pool (design TBD), so your workers in the same OCP version will run different RHEL majors.

We will eventually have to have a RHEL9->10 upgrade path, so dual stream aside generally speaking I think we'd need to have cross compatibility, so we should probably clarify this.

But we would never want a RHEL9->11 upgrade path, I think would be the only breaking case.

djoshy · 2025-03-11T18:47:32Z

In the last push:

Added some more issues to the Motivation section such as composefs, sigstore etc
Added ManagedBootImages status API
Other minor cleanups/removes based on feedback

openshift-ci · 2025-03-11T22:59:54Z

@djoshy: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

jlebon

Thanks a lot for working on this!

One thing that I think is implied but should probably be spelled out more is how skew comparison actually works. I.e. are we literally parsing RHCOS bootimage version strings and doing comparisons (in that case, recent versioning changes make that trickier)?

Or I think a saner approach is to compare OCP versions instead given that RHCOS bootimage versioning is not super meaningful to the rest of OCP. I.e. the skew policies would reference OCP versions and the coreos-bootimages configmap would reference the OCP version it's for?

jlebon · 2025-03-19T21:12:16Z

enhancements/machine-config/manage-boot-images.md

@@ -13,7 +13,7 @@ approvers:
 api-approvers: 


Procedural: feel free to add my name in the reviewers section above.

jlebon · 2025-03-19T21:13:15Z

enhancements/machine-config/manage-boot-images.md

@@ -27,7 +27,7 @@ superseded-by:

 ## Summary

-This is a proposal to manage bootimages via the `Machine Config Operator`(MCO), leveraging some of the [pre-work](https://github.com/openshift/installer/pull/4760) done as a result of the discussion in [#201](https://github.com/openshift/enhancements/pull/201). This feature will only target standalone OCP installs. It will also be user opt-in and is planned to be released behind a feature gate.
+This is a proposal to manage bootimages via the `Machine Config Operator`(MCO), leveraging some of the [pre-work](https://github.com/openshift/installer/pull/4760) done as a result of the discussion in [#201](https://github.com/openshift/enhancements/pull/201). This feature will only target standalone OCP installs. This is now released as an opt-in feature and will be rolled out on a per-platform basis (see projected roadmap). This will eventually be on by default, and the MCO will enforce an accepted skew and require non-platform managed bootimage updates to be acknowledged by the cluster admin.

 For `MachineSet` managed clusters, the end goal is to create an automated mechanism that can:
 - update the boot images references in `MachineSets` to the latest in the payload image


I can't comment lower than this line, but I think this line:

For clusters that are not managed by MachineSets, the end goal is to create a document(KB or otherwise) that a cluster admin would follow to update their boot images.

should probably also be updated to mention the strategy of (1) manually bumping the right configmap/API object, and (2) skew enforcement?

jlebon · 2025-03-20T00:48:01Z

enhancements/machine-config/manage-boot-images.md

-In certain long lived clusters, the MCS TLS cert contained within the above Ignition configuration may be out of date. Example issue [here](https://issues.redhat.com/browse/OCPBUGS-1817). While this has been partly solved [MCO-642](https://issues.redhat.com/browse/MCO-642) (which allows the user to manually rotate the cert) it would be very beneficial for the MCO to actively manage this TLS cert and take this concern away from the user.
+In certain long lived clusters, the MCS TLS cert contained within the above Ignition configuration may be out of date. Example issue [here](https://issues.redhat.com/browse/OCPBUGS-1817). While this has been partly solved [MCO-642](https://issues.redhat.com/browse/MCO-642) (which allows the user to manually rotate the cert) it would be very beneficial for the MCO to actively manage this TLS cert and take this concern away from the user. 
+
+**Note**: As of 4.19, the MCO supports [management of this TLS cert](https://issues.redhat.com/browse/MCO-1208). With this work in place, the MCO can now attempt to upgrade the stub Ignition config, instead of hardcoding to the `*-managed` stub as mentioned previously. This will help preserve any user customizations that were present in the stub Ignition config.


This sentence is confusing because two paragraphs above we say that the MCO will ignore user customizations in the stub and here we say that we can now preserve user customizations. Can we fold this sentence back into that paragraph and reword to reflect exactly what the strategy is?

jlebon · 2025-03-20T00:48:42Z

enhancements/machine-config/manage-boot-images.md

+
+**Note**: As of 4.19, the MCO supports [management of this TLS cert](https://issues.redhat.com/browse/MCO-1208). With this work in place, the MCO can now attempt to upgrade the stub Ignition config, instead of hardcoding to the `*-managed` stub as mentioned previously. This will help preserve any user customizations that were present in the stub Ignition config.
+
+This is also considered a blocking issue for [SigStore GA](https://issues.redhat.com/browse/OCPNODE-2619). It has caused issues such as [OCPBUGS-38809](https://issues.redhat.com/browse/OCPBUGS-38809) due to the older podman binary not being able to understand `sigstoreSigned` fields in `/etc/containers/policy.json`. There can be similar issues in the future that can be hard to anticipate. 


Should this instead be added to list of issues linked above so it's all in one place?

jlebon · 2025-03-20T00:56:36Z

enhancements/machine-config/manage-boot-images.md

@@ -77,7 +85,7 @@ __Overview__
  - `ManagedBootImages` feature gate is active
  - The cluster and/or the machineset is opted-in to boot image updates. This is done at the operator level, via the `MachineConfiguration` API object.
  - The `machineset` does not have a valid owner reference. Having a valid owner reference typically indicates that the `MachineSet` is managed by another workflow, and that updates to it are likely going to cause thrashing. 
-  - The golden configmap is verified to be in sync with the current version of the MCO. The MCO will update("stamp") the golden configmap with version of the new MCO image after atleast 1 master node has succesfully completed an update to the new OCP image. This helps prevent `machinesets` being updated too soon at the end of a cluster upgrade, before the MCO itself has updated and has had a chance to roll out the new OCP image to the cluster. 
+  - The golden configmap is verified to be in sync with the current version of the MCO. The MCO will update("stamp") the golden configmap with version of the new MCO image after at least 1 master node has successfully completed an update to the new OCP image. This helps prevent `machinesets` being updated too soon at the end of a cluster upgrade, before the MCO itself has updated and has had a chance to roll out the new OCP image to the cluster.

  If any of the above checks fail, the MSBIC will exit out of the sync.
 - Based on platform and architecture type, the MSBIC will check if the boot images referenced in the `providerSpec` field of the `MachineSet` is the same as the one in the ConfigMap. Each platform(gcp, aws...and so on) does this differently, so this part of the implementation will have to be special cased. The ConfigMap is considered to be the golden set of bootimage values, i.e. they will never go out of date. If it is not a match, the `providerSpec` field is cloned and updated with the new boot image reference.


Can't comment lower than this, but: should the MSBIC add an owner reference to itself on the MachineSet after updating it? (And obviously change the precondition checks above to check whether the MachineSet has either no owner, or the MSBIC as owner.)

Otherwise, other controllers might have the same logic and also update without taking ownership and you still get thrashing.

jlebon · 2025-03-21T20:13:48Z

enhancements/machine-config/manage-boot-images.md

+```mermaid
+flowchart-elk TD;
+    Start((Start)) -->MachineSetOwnerCheck[Does the MachineSet have an OwnerReference?]
+    MachineSetOwnerCheck -->|Yes|Stop


(If we add an ownerReference for ourselves, I think this would require changing.)

jlebon · 2025-03-21T20:19:47Z

enhancements/machine-config/manage-boot-images.md

+```
+Some points to note:
+- For bookkeeping purposes, the MCO will annotate the `MachineConfiguration` object when opting in the cluster by default.
+- This mechanism will be active on installs and upgrades. 


Hmm, could it make sense to have different behaviours for new installs vs upgrades? So e.g. when we GA bootimage updates for a platform, we turn it on for new installs. For upgrades, we turn it on on the next release. This provides a natural "rollout" and gives us a higher chance of finding issues before it's on across the board.

jlebon · 2025-03-21T20:43:46Z

enhancements/machine-config/manage-boot-images.md

+
+The cluster admin may also choose to opt-out of skew management via this configmap, which indicates that they will not require scaling nodes, and thereby opting out of skew enforcement and scaling functionality.
+
+A potential problem here is that the way boot images are stored in the machineset is lossy. In certain platforms, there is no way to recover the boot image metadata from the MachineSet. This is most likely to happen the first time the MCO attempts to do skew enforcement on a cluster that has never had boot image updates. In such cases, the MCO will default to the install time boot image, which can be recovered from the [aleph version](https://github.com/coreos/coreos-assembler/pull/768) of the control plane nodes. 


Past the first update, can you clarify how the MSBIC knows which bootimage version is in a MachineSet? Will it add e.g. an annotation on the MachineSet when it patches it?

The way this relates to this line here is that I think rather than using the aleph of the control plane nodes, we could also just make the installer add the necessary annotation when it creates the MachineSet, right?

Clusters born from installers without that patch won't have the annotation which implies it's at least older than the OCP release containing the patch. Cluster born from installers with it will have it available.

jlebon · 2025-03-21T20:49:41Z

enhancements/machine-config/manage-boot-images.md

+- Opt-out of skew enforcement altogether, giving up scaling ability.
+
+#### Reactive
+1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled.


A variation on the MCS rejection idea is to just serve an Ignition config that writes e.g. an /etc/issue with a message explaining that the bootimage is too old. But yeah, the MCS should also surface this up on the cluster-side so it's not only visible from the node's console.

jlebon · 2025-03-21T20:56:14Z

enhancements/machine-config/manage-boot-images.md

+
+#### Reactive
+1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled.
+2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far.


Possibly Jerry meant "inject that information into the Ignition config"?

initial draft for default-on boot image updates

980d43e

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 26, 2025

openshift-ci bot requested review from stbenjam and tjungblu February 26, 2025 21:05

djoshy commented Feb 26, 2025

View reviewed changes

enhancements/machine-config/manage-boot-images.md Outdated Show resolved Hide resolved

djoshy commented Feb 26, 2025

View reviewed changes

enhancements/machine-config/manage-boot-images.md Show resolved Hide resolved

djoshy commented Feb 26, 2025

View reviewed changes

djoshy mentioned this pull request Feb 26, 2025

Update bootimage management enhancement #1698

Closed

djoshy commented Feb 26, 2025

View reviewed changes

yuqi-zhang reviewed Feb 27, 2025

View reviewed changes

wking reviewed Feb 28, 2025

View reviewed changes

enhancements/machine-config/manage-boot-images.md Outdated Show resolved Hide resolved

wking reviewed Feb 28, 2025

View reviewed changes

enhancements/machine-config/manage-boot-images.md Outdated Show resolved Hide resolved

djoshy force-pushed the manage-boot-images-skew branch from 1885750 to 189212c Compare March 4, 2025 15:52

djoshy force-pushed the manage-boot-images-skew branch from 189212c to 34024e9 Compare March 4, 2025 16:19

updated timeline and API

c2ef0ca

djoshy force-pushed the manage-boot-images-skew branch from 34024e9 to c2ef0ca Compare March 4, 2025 16:35

JoelSpeed reviewed Mar 7, 2025

View reviewed changes

updated motivation, ownerref changes, status API

54d1b56

djoshy force-pushed the manage-boot-images-skew branch from fd8861c to 54d1b56 Compare March 11, 2025 20:35

jlebon reviewed Mar 21, 2025

View reviewed changes

		2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far.

		RHEL major versions will no longer be cross-compatible. i.e. if you wish to have a RHEL10 machineconfigpool, you must use a RHEL10 bootimage.


		Note: As of 4.19, the MCO supports [management of this TLS cert](https://issues.redhat.com/browse/MCO-1208). With this work in place, the MCO can now attempt to upgrade the stub Ignition config, instead of hardcoding to the `*-managed` stub as mentioned previously. This will help preserve any user customizations that were present in the stub Ignition config.

		This is also considered a blocking issue for [SigStore GA](https://issues.redhat.com/browse/OCPNODE-2619). It has caused issues such as [OCPBUGS-38809](https://issues.redhat.com/browse/OCPBUGS-38809) due to the older podman binary not being able to understand `sigstoreSigned` fields in `/etc/containers/policy.json`. There can be similar issues in the future that can be hard to anticipate.


		The cluster admin may also choose to opt-out of skew management via this configmap, which indicates that they will not require scaling nodes, and thereby opting out of skew enforcement and scaling functionality.

		A potential problem here is that the way boot images are stored in the machineset is lossy. In certain platforms, there is no way to recover the boot image metadata from the MachineSet. This is most likely to happen the first time the MCO attempts to do skew enforcement on a cluster that has never had boot image updates. In such cases, the MCO will default to the install time boot image, which can be recovered from the [aleph version](https://github.com/coreos/coreos-assembler/pull/768) of the control plane nodes.

MCO-1504: Update bootimage management enhancement #1761

Are you sure you want to change the base?

MCO-1504: Update bootimage management enhancement #1761

Conversation

djoshy commented Feb 26, 2025

openshift-ci-robot commented Feb 26, 2025 • edited by openshift-ci bot Loading

openshift-ci bot commented Feb 26, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djoshy commented Mar 4, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djoshy Mar 10, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djoshy commented Mar 11, 2025

openshift-ci bot commented Mar 11, 2025

jlebon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

openshift-ci-robot commented Feb 26, 2025 •

edited by openshift-ci bot

Loading

djoshy Mar 10, 2025 •

edited

Loading