-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
MCO-1504: Update bootimage management enhancement #1761
base: master
Are you sure you want to change the base?
Conversation
@djoshy: This pull request references MCO-1504 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
- For machineset backed clusters, this would be updated by the MSBIC after it succesfully updates boot images. | ||
- For non-machineset backed clusters, this would be updated by the cluster admin to indicate the last manually updated bootimage. The cluster admin would need to update this configmap every few releases, when the RHEL minor on which the RHCOS container is built on changes (e.g. 9.6->9.8). | ||
|
||
The cluster admin may also choose to opt-out of skew management via this configmap, which indicates that they will not require scaling nodes, and thereby opting out of skew enforcement and scaling functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of a couple alternate routes for the opt-out here:
- Deleting the configmap. This may add complexity on the MCO to "book keep" the creation/deletion of the configmap. It might be safer to use a field within the configmap to indicate opting out of the skew instead.
- Add a new cluster level "skew-enforcement" knob within the ManagedBootImages API field. I think it is important to keep this separate from the knob that selects machine resources for boot image updates, as using a single control for the "opt-in" and "skew" mechanism may makes things a bit confusing.
Happy to hear other ideas too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have done one-off configmaps for some features during upgrade (cgroup default for example) but I think this has too many contact points to make that management straightforward. I'd lean towards making it an explicit API field (or I guess annotation, like the opt out)
On the general approach, I think the Proactive approach is easier to maintain, albeit maybe annoying for some users who have to ack every few releases. But then again if they don't want to scale at all, they can just turn skew enforcement off (do we stop them from scaling altogether? or try on a best effort basis then?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on doing this via API as well.
maybe annoying for some users who have to ack every few releases.
Could you clarify this? My understanding was that if a cluster has opted out of skew enforcement, they wouldn't have to do that again. From the MCO POV, this means that:
- We no longer proactively degrade the cluster if the boot images are out of date.
- If they attempt scaling after that, and the skew is large enough, either of the reactive approaches should cover this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 lets avoid configmaps, they don't age well
|
||
#### Reactive | ||
1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled. | ||
2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this in from #1698, in case I was missing something. How would the daemon know the "acceptable" skew during firstboot? I think we could potentially do this after the pivot and yell at the admin, but IMO the "reject join" approach would probably cover this case and never let the firstboot daemon get to pivot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we'd have to inject that information into the payload.
Also this would cover cases where the environment doesn't use the MCS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I might have misunderstood something here then, does the first boot daemon have access to the release payload? I thought all it had was the target MachineConfig
when it does the first boot pivot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly Jerry meant "inject that information into the Ignition config"?
2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far. | ||
|
||
RHEL major versions will no longer be cross-compatible. i.e. if you wish to have a RHEL10 machineconfigpool, you must use a RHEL10 bootimage. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @wking @yuqi-zhang
picking up the converstation from #1698:
From this point on, MCO will target RHEL 10 for new nodes scaling into this MC
I'll let Jerry weigh in here, but my read here was that we aren't planning on doing any MCP specific enforcement. I think Jerry was implying this would be result from the aforementioned enforcement methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so, when initially discussing around RHEL 10, it was around dual-stream, where you'd simultaneously have rhel9 and rhel10 based workers, and each type would have to boot from the same origin major for the bootimage. I think the original intention was to reduce potential 9->10 upgrade issues until RHEL 10 is more stable, but I could be wrong there (cc @sdodson )
When transitioning the cluster base RHCOS nodes from 9->10 then it would be a different problem. I think we'd have to have some cross compatibility there eventually and allow for rhel9 bootimages to work for at least 1 version where the shipped image is RHEL10
- For machineset backed clusters, this would be updated by the MSBIC after it succesfully updates boot images. | ||
- For non-machineset backed clusters, this would be updated by the cluster admin to indicate the last manually updated bootimage. The cluster admin would need to update this configmap every few releases, when the RHEL minor on which the RHCOS container is built on changes (e.g. 9.6->9.8). | ||
|
||
The cluster admin may also choose to opt-out of skew management via this configmap, which indicates that they will not require scaling nodes, and thereby opting out of skew enforcement and scaling functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have done one-off configmaps for some features during upgrade (cgroup default for example) but I think this has too many contact points to make that management straightforward. I'd lean towards making it an explicit API field (or I guess annotation, like the opt out)
On the general approach, I think the Proactive approach is easier to maintain, albeit maybe annoying for some users who have to ack every few releases. But then again if they don't want to scale at all, they can just turn skew enforcement off (do we stop them from scaling altogether? or try on a best effort basis then?)
|
||
#### Reactive | ||
1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled. | ||
2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we'd have to inject that information into the payload.
Also this would cover cases where the environment doesn't use the MCS
2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far. | ||
|
||
RHEL major versions will no longer be cross-compatible. i.e. if you wish to have a RHEL10 machineconfigpool, you must use a RHEL10 bootimage. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so, when initially discussing around RHEL 10, it was around dual-stream, where you'd simultaneously have rhel9 and rhel10 based workers, and each type would have to boot from the same origin major for the bootimage. I think the original intention was to reduce potential 9->10 upgrade issues until RHEL 10 is more stable, but I could be wrong there (cc @sdodson )
When transitioning the cluster base RHCOS nodes from 9->10 then it would be a different problem. I think we'd have to have some cross compatibility there eventually and allow for rhel9 bootimages to work for at least 1 version where the shipped image is RHEL10
1885750
to
189212c
Compare
In the last push:
|
189212c
to
34024e9
Compare
34024e9
to
c2ef0ca
Compare
|
||
This work will be tracked in [MCO-793](https://issues.redhat.com/browse/MCO-793). | ||
|
||
##### Projected timeline | ||
|
||
This is a tentative timeline, subject to change (GA = General Availability, TP = Tech Preview, DEF = Default-on). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between GA and DEF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, GA is opt-in, and DEF is opt-out, ideally a release later. I'll try to clarify that GA is still opt-in here 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will you make a decision that a platform is sufficiently well tested to be able to go default on? Is there any feedback or data you can gather to show everything is working as you'd expect it to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, realistically speaking, there's a limited amount of test coverage for scenarios like this, since this will be more relevant for longer-lived clusters (upgraded through multiple releases, and have customizations). That isn't something we generally test within CI (there's some tests that install really old versions in QE suite and upgrade through the versions of OCP, but not ran very often as far as I'm aware).
One thing we would like to add for more signal is some metrics, and splitting up GA and DEF gives us more time to gather some data there for potential error cases (really the biggest potential problems will be around user ignition customization).
This distinction can also be blurred for later platforms (i.e. have GA and DEF be the same step). For current sets of platforms we wanted to have some more soak time for tests and CI jobs that we do have.
// skewEnforcement allows an admin to set behavior of the boot image skew enforcement mechanism. | ||
// Enabled means that the MCO will degrade and prevent upgrades when the boot image skew is too large. | ||
// Disabled means that the MCO will no longer degrade and will permit upgrades when the boot image skew is | ||
// too large. This may also hinder the cluster's scaling ability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define too large? What are the potential pitfalls of "too large" of a skew?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By too large, I meant to say that it fails the skew guidance from the release image - I'll clarify the godoc to better desribe this. The main pitfall is that scaling would most likely fail, i.e. the pivot to release OS image isn't possible if your current boot image is below x. If scaling is a non-issue for the cluster in question, they could disable it and the cluster would be able to carry out upgrades again.
// Disabled means that the MCO will no longer degrade and will permit upgrades when the boot image skew is | ||
// too large. This may also hinder the cluster's scaling ability. | ||
// +optional | ||
SkewEnforcement SkewEnforcementSelectorMode `json:"skewEnforcement"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you were to make the enum values here represent the actual skew of the images, what might this look like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that we should make the skew configurable? My understanding was that it needed to be something constant for a release(defined in the releaseImage
) and it could potentially change between releases, but not something an operator/admin would get to manually set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I more mean, you have skew enforcement as Enabled or Disabled. What if skew enforcement were more like ReleaseRecommended and Disabled, would that make more sense, and allow for a future expansion where an admin could opt in and say, actually, I want SingleRelease skew, or DualRelease skew? Allowing them to set their own guidelines and override what is recommended by the release image itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, what's the use case you're thinking about? I think the most likely scenario is that they want to override the skew check in the release image because they don't care about scaling. I'm not sure if they'd be interested in making the skew check tighter than we require.
Introduce a new configmap in the MCO namespace that will store the last updated boot image and allows for easy comparison against the | ||
skew policy described in the release payload. | ||
- For machineset backed clusters, this would be updated by the MSBIC after it succesfully updates boot images. | ||
- For non-machineset backed clusters, this would be updated by the cluster admin to indicate the last manually updated bootimage. The cluster admin would need to update this configmap every few releases, when the RHEL minor on which the RHCOS container is built on changes (e.g. 9.6->9.8). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a configmap and not an actual API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I gravitated towards a configmap for this because we could potentially reuse the existing coreos-bootimages
(golden) configmap, and define a new "current cluster boot image" field within it.
cc @yuqi-zhang incase there was some other goal here, this was my thought 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong preference here. Like David said, we already have configmaps in place. We could leverage an API field if we wanted to validate at the API level, but maybe not necessary for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're using the configmap as a proxy API, except configmaps are poor-mans APIs since they lack any structure or validation, I'd strongly recommend moving away from admins having to set specific values in configmaps, and create an actual API for this
- For machineset backed clusters, this would be updated by the MSBIC after it succesfully updates boot images. | ||
- For non-machineset backed clusters, this would be updated by the cluster admin to indicate the last manually updated bootimage. The cluster admin would need to update this configmap every few releases, when the RHEL minor on which the RHCOS container is built on changes (e.g. 9.6->9.8). | ||
|
||
The cluster admin may also choose to opt-out of skew management via this configmap, which indicates that they will not require scaling nodes, and thereby opting out of skew enforcement and scaling functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 lets avoid configmaps, they don't age well
- Opt-out of skew enforcement altogether, giving up scaling ability. | ||
|
||
#### Reactive | ||
1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As someone who maintains the Machine API/Cluster API components, and would have to deal with the customers complaining that their machines can't scale up, I'm a hard no to this idea.
Ignition failures are hard to diagnose already and we are constantly triaging them already as people assume they are a failure in our ability to provision instances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was proposed because pivoting was going to fail anyway and this was a way of warning the user. cc @yuqi-zhang if I'm missing something here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's 2 intents behind this idea:
- we don't force users to be up to date if they don't want scaling, so this is mostly a fallback error that would hopefully not be hit. If instead we want to say "we always require proactive user action", then we wouldn't need this fallback error
- the MCS failure will bubble up via the MCO's CO object so the MCO actually degrades alongside no nodes joining the cluster, instead of the "stuck in provisioned state" we have today (which the MCO would not surface), essentially loudly failing before we even get to the ignition stage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's incredibly obvious, eg from MCS logs as to why it is not serving the ignition, then debugging this may become easier, but generally any "I scaled up and my node didn't join the cluster" issue goes to the cluster infra team and this behaviour sounds like it'll make this more common. I'd be keen to make sure we do all we can to avoid more noise for the cluster infra team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A variation on the MCS rejection idea is to just serve an Ignition config that writes e.g. an /etc/issue
with a message explaining that the bootimage is too old. But yeah, the MCS should also surface this up on the cluster-side so it's not only visible from the node's console.
1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled. | ||
2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far. | ||
|
||
RHEL major versions will no longer be cross-compatible. i.e. if you wish to have a RHEL10 machineconfigpool, you must use a RHEL10 bootimage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a breaking change, why now?
I understand there's lots changing about our boot images, but, is this a one off, or a constant issue going forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specifically is for dual-stream support, where in some version of OCP (likely 4.20?) that we will have a special RHEL-10 pool (design TBD), so your workers in the same OCP version will run different RHEL majors.
We will eventually have to have a RHEL9->10 upgrade path, so dual stream aside generally speaking I think we'd need to have cross compatibility, so we should probably clarify this.
But we would never want a RHEL9->11 upgrade path, I think would be the only breaking case.
In the last push:
|
fd8861c
to
54d1b56
Compare
@djoshy: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this!
One thing that I think is implied but should probably be spelled out more is how skew comparison actually works. I.e. are we literally parsing RHCOS bootimage version strings and doing comparisons (in that case, recent versioning changes make that trickier)?
Or I think a saner approach is to compare OCP versions instead given that RHCOS bootimage versioning is not super meaningful to the rest of OCP. I.e. the skew policies would reference OCP versions and the coreos-bootimages configmap would reference the OCP version it's for?
@@ -13,7 +13,7 @@ approvers: | |||
api-approvers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Procedural: feel free to add my name in the reviewers section above.
@@ -27,7 +27,7 @@ superseded-by: | |||
|
|||
## Summary | |||
|
|||
This is a proposal to manage bootimages via the `Machine Config Operator`(MCO), leveraging some of the [pre-work](https://github.com/openshift/installer/pull/4760) done as a result of the discussion in [#201](https://github.com/openshift/enhancements/pull/201). This feature will only target standalone OCP installs. It will also be user opt-in and is planned to be released behind a feature gate. | |||
This is a proposal to manage bootimages via the `Machine Config Operator`(MCO), leveraging some of the [pre-work](https://github.com/openshift/installer/pull/4760) done as a result of the discussion in [#201](https://github.com/openshift/enhancements/pull/201). This feature will only target standalone OCP installs. This is now released as an opt-in feature and will be rolled out on a per-platform basis (see projected roadmap). This will eventually be on by default, and the MCO will enforce an accepted skew and require non-platform managed bootimage updates to be acknowledged by the cluster admin. | |||
|
|||
For `MachineSet` managed clusters, the end goal is to create an automated mechanism that can: | |||
- update the boot images references in `MachineSets` to the latest in the payload image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't comment lower than this line, but I think this line:
For clusters that are not managed by
MachineSets
, the end goal is to create a document(KB or otherwise) that a cluster admin would follow to update their boot images.
should probably also be updated to mention the strategy of (1) manually bumping the right configmap/API object, and (2) skew enforcement?
In certain long lived clusters, the MCS TLS cert contained within the above Ignition configuration may be out of date. Example issue [here](https://issues.redhat.com/browse/OCPBUGS-1817). While this has been partly solved [MCO-642](https://issues.redhat.com/browse/MCO-642) (which allows the user to manually rotate the cert) it would be very beneficial for the MCO to actively manage this TLS cert and take this concern away from the user. | ||
In certain long lived clusters, the MCS TLS cert contained within the above Ignition configuration may be out of date. Example issue [here](https://issues.redhat.com/browse/OCPBUGS-1817). While this has been partly solved [MCO-642](https://issues.redhat.com/browse/MCO-642) (which allows the user to manually rotate the cert) it would be very beneficial for the MCO to actively manage this TLS cert and take this concern away from the user. | ||
|
||
**Note**: As of 4.19, the MCO supports [management of this TLS cert](https://issues.redhat.com/browse/MCO-1208). With this work in place, the MCO can now attempt to upgrade the stub Ignition config, instead of hardcoding to the `*-managed` stub as mentioned previously. This will help preserve any user customizations that were present in the stub Ignition config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is confusing because two paragraphs above we say that the MCO will ignore user customizations in the stub and here we say that we can now preserve user customizations. Can we fold this sentence back into that paragraph and reword to reflect exactly what the strategy is?
|
||
**Note**: As of 4.19, the MCO supports [management of this TLS cert](https://issues.redhat.com/browse/MCO-1208). With this work in place, the MCO can now attempt to upgrade the stub Ignition config, instead of hardcoding to the `*-managed` stub as mentioned previously. This will help preserve any user customizations that were present in the stub Ignition config. | ||
|
||
This is also considered a blocking issue for [SigStore GA](https://issues.redhat.com/browse/OCPNODE-2619). It has caused issues such as [OCPBUGS-38809](https://issues.redhat.com/browse/OCPBUGS-38809) due to the older podman binary not being able to understand `sigstoreSigned` fields in `/etc/containers/policy.json`. There can be similar issues in the future that can be hard to anticipate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this instead be added to list of issues linked above so it's all in one place?
@@ -77,7 +85,7 @@ __Overview__ | |||
- `ManagedBootImages` feature gate is active | |||
- The cluster and/or the machineset is opted-in to boot image updates. This is done at the operator level, via the `MachineConfiguration` API object. | |||
- The `machineset` does not have a valid owner reference. Having a valid owner reference typically indicates that the `MachineSet` is managed by another workflow, and that updates to it are likely going to cause thrashing. | |||
- The golden configmap is verified to be in sync with the current version of the MCO. The MCO will update("stamp") the golden configmap with version of the new MCO image after atleast 1 master node has succesfully completed an update to the new OCP image. This helps prevent `machinesets` being updated too soon at the end of a cluster upgrade, before the MCO itself has updated and has had a chance to roll out the new OCP image to the cluster. | |||
- The golden configmap is verified to be in sync with the current version of the MCO. The MCO will update("stamp") the golden configmap with version of the new MCO image after at least 1 master node has successfully completed an update to the new OCP image. This helps prevent `machinesets` being updated too soon at the end of a cluster upgrade, before the MCO itself has updated and has had a chance to roll out the new OCP image to the cluster. | |||
|
|||
If any of the above checks fail, the MSBIC will exit out of the sync. | |||
- Based on platform and architecture type, the MSBIC will check if the boot images referenced in the `providerSpec` field of the `MachineSet` is the same as the one in the ConfigMap. Each platform(gcp, aws...and so on) does this differently, so this part of the implementation will have to be special cased. The ConfigMap is considered to be the golden set of bootimage values, i.e. they will never go out of date. If it is not a match, the `providerSpec` field is cloned and updated with the new boot image reference. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't comment lower than this, but: should the MSBIC add an owner reference to itself on the MachineSet after updating it? (And obviously change the precondition checks above to check whether the MachineSet has either no owner, or the MSBIC as owner.)
Otherwise, other controllers might have the same logic and also update without taking ownership and you still get thrashing.
```mermaid | ||
flowchart-elk TD; | ||
Start((Start)) -->MachineSetOwnerCheck[Does the MachineSet have an OwnerReference?] | ||
MachineSetOwnerCheck -->|Yes|Stop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If we add an ownerReference for ourselves, I think this would require changing.)
``` | ||
Some points to note: | ||
- For bookkeeping purposes, the MCO will annotate the `MachineConfiguration` object when opting in the cluster by default. | ||
- This mechanism will be active on installs and upgrades. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, could it make sense to have different behaviours for new installs vs upgrades? So e.g. when we GA bootimage updates for a platform, we turn it on for new installs. For upgrades, we turn it on on the next release. This provides a natural "rollout" and gives us a higher chance of finding issues before it's on across the board.
|
||
The cluster admin may also choose to opt-out of skew management via this configmap, which indicates that they will not require scaling nodes, and thereby opting out of skew enforcement and scaling functionality. | ||
|
||
A potential problem here is that the way boot images are stored in the machineset is lossy. In certain platforms, there is no way to recover the boot image metadata from the MachineSet. This is most likely to happen the first time the MCO attempts to do skew enforcement on a cluster that has never had boot image updates. In such cases, the MCO will default to the install time boot image, which can be recovered from the [aleph version](https://github.com/coreos/coreos-assembler/pull/768) of the control plane nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Past the first update, can you clarify how the MSBIC knows which bootimage version is in a MachineSet? Will it add e.g. an annotation on the MachineSet when it patches it?
The way this relates to this line here is that I think rather than using the aleph of the control plane nodes, we could also just make the installer add the necessary annotation when it creates the MachineSet, right?
Clusters born from installers without that patch won't have the annotation which implies it's at least older than the OCP release containing the patch. Cluster born from installers with it will have it available.
- Opt-out of skew enforcement altogether, giving up scaling ability. | ||
|
||
#### Reactive | ||
1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A variation on the MCS rejection idea is to just serve an Ignition config that writes e.g. an /etc/issue
with a message explaining that the bootimage is too old. But yeah, the MCS should also surface this up on the cluster-side so it's not only visible from the node's console.
|
||
#### Reactive | ||
1. Have the MCS reject new ignition requests if the aformentioned configmap indicates that the cluster's bootimages are out of date. The MCS could then signal to the cluster admin that scale-up is not available until the configmap has been reconciled. | ||
2. Add a service to be shipped via RHCOS/MCO templates, which will do a check on incoming OS container image vs currently booted RHCOS version. This runs on firstboot right after the MCD pulls the new image, and will prevent the node to rebase to the updated image if the drift is too far. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly Jerry meant "inject that information into the Ignition config"?
This is a follow-up update for #1496 and proposes a strategy for implementing an opt-out and skew enforcement mechanism for boot image updates. A lot of this is based on #1698 by @yuqi-zhang - Thanks, Jerry!
All comments and questions welcome. I have a few open questions for which I'll be leaving comments below.
cc @jlebon @wking
And sorta unrelated: I've also move some of the older flowcharts to mermaid diagrams as they are more maintainable.