Skip to content

Commit 54d1b56

Browse files
committed
updated motivation, ownerref changes, status API
1 parent c2ef0ca commit 54d1b56

File tree

1 file changed

+90
-15
lines changed

1 file changed

+90
-15
lines changed

enhancements/machine-config/manage-boot-images.md

+90-15
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ Currently, bootimage references are [stored](https://github.com/openshift/instal
4242
- Afterburn [[1](https://issues.redhat.com/browse/OCPBUGS-7559)],[[2](https://issues.redhat.com/browse/OCPBUGS-4769)]
4343
- podman [[1](https://issues.redhat.com/browse/OCPBUGS-9969)]
4444
- skopeo [[1](https://issues.redhat.com/browse/OCPBUGS-3621)]
45+
- composefs [[1](https://github.com/openshift/os/issues/1678#issuecomment-2546310833)]
4546

4647
Additionally, the stub Ignition config [referenced](https://github.com/openshift/installer/blob/1ca0848f0f8b2ca9758493afa26bf43ebcd70410/pkg/asset/machines/gcp/machines.go#L197) in the `MachineSet` is also not managed. This stub is used by the ignition binary in firstboot to auth and consume content from the `machine-config-server`(MCS). The content served includes the actual Ignition configuration and the target OCI format RHCOS image. The ignition binary now does first boot provisioning based on this, then hands off to the `machine-config-daemon`(MCD) first boot service to do the reboot into the target OCI format RHCOS image.
4748

@@ -51,6 +52,9 @@ In certain long lived clusters, the MCS TLS cert contained within the above Igni
5152

5253
**Note**: As of 4.19, the MCO supports [management of this TLS cert](https://issues.redhat.com/browse/MCO-1208). With this work in place, the MCO can now attempt to upgrade the stub Ignition config, instead of hardcoding to the `*-managed` stub as mentioned previously. This will help preserve any user customizations that were present in the stub Ignition config.
5354

55+
This is also considered a blocking issue for [SigStore GA](https://issues.redhat.com/browse/OCPNODE-2619). It has caused issues such as [OCPBUGS-38809](https://issues.redhat.com/browse/OCPBUGS-38809) due to the older podman binary not being able to understand `sigstoreSigned` fields in `/etc/containers/policy.json`. There can be similar issues in the future that can be hard to anticipate.
56+
57+
5458
This is also a soft pre-requisite for both dual-stream RHEL support in OpenShift, and on-cluster layered builds. RPM-OSTree presently does a deploy-from-self to get a new-enough rpm-ostree to deploy image-based RHEL CoreOS systems, and we would like to avoid doing this for bootc if possible. We would also like to prevent RHEL8->RHEL10 direct updates once that is available for OpenShift.
5559

5660
### User Stories
@@ -92,7 +96,6 @@ __Overview__
9296
#### Error & Alert Mechanism
9397

9498
MSBIC sync failures may be caused by multiple reasons:
95-
- The MSBIC notices an OwnerReference and is able to determine that updating the `MachineSet` will likely cause thrashing. This is considered a misconfiguration and in such cases, the user is expected to exclude this `MachineSet` from boot image management.
9699
- The `coreos-bootimages` ConfigMap is unavailable or in an incorrect format. This will likely happen if a user manually edits the ConfigMap, overriding the CVO.
97100
- The `coreos-bootimages` ConfigMap takes too long to be stamped by the MCO. This indicates that there are larger problems in the cluster such as an upgrade failure/timeout or an unrelated cluster failure.
98101
- Patching the `MachineSet` fails. This indicates a temporary API server blip, or larger RBAC issues.
@@ -115,7 +118,7 @@ Any form factor using the MCO and `MachineSets` will be impacted by this proposa
115118
- Standalone OpenShift: Yes, this is the main target form factor.
116119
- microshift: No, as it does [not](https://github.com/openshift/microshift/blob/main/docs/contributor/enabled_apis.md) use `MachineSets`.
117120
- Hypershift: No, Hypershift does not have this issue.
118-
- Hive: Hive manages `MachineSets` via `MachinePools`. The MachinePool controller generates the `MachineSets` manifests (by invoking vendored installer code) which include the `providerSpec`. Once a `MachineSet` has been created on the spoke, the only things that will be reconciled on it are replicas, labels, and taints - [unless a backdoor is enabled](https://github.com/openshift/hive/blob/0d5507f91935701146f3615c990941f24bd42fe1/pkg/constants/constants.go#L518). If the `providerSpec` ever goes out of sync, a warning will be logged by the MachinePool controller but otherwise this discrepancy is ignored. In such cases, the MSBIC will not have any issue reconciling the `providerSpec` to the correct boot image. However, if the backdoor is enabled, both the MSBIC and the MachinePool Controller will attempt to reconcile the `providerSpec` field, causing churn. The Hive team will update the comment on the backdoor annotation to indicate that it is mutually exclusive with this feature.
121+
- Hive: Hive manages `MachineSets` via `MachinePools`. The MachinePool controller generates the `MachineSets` manifests (by invoking vendored installer code) which include the `providerSpec`. Once a `MachineSet` has been created on the spoke, the only things that will be reconciled on it are replicas, labels, and taints - [unless a backdoor is enabled](https://github.com/openshift/hive/blob/0d5507f91935701146f3615c990941f24bd42fe1/pkg/constants/constants.go#L518). If the `providerSpec` ever goes out of sync, a warning will be logged by the MachinePool controller but otherwise this discrepancy is ignored. In such cases, the MSBIC will not have any issue reconciling the `providerSpec` to the correct boot image. However, if the backdoor is enabled, both the MSBIC and the MachinePool Controller will attempt to reconcile the `providerSpec` field, causing churn. The Hive team has [updated the comment](https://github.com/openshift/hive/pull/2596/files) on the backdoor annotation to indicate that it is mutually exclusive with this feature.
119122

120123
##### Supported platforms
121124

@@ -135,13 +138,14 @@ This work will be tracked in [MCO-793](https://issues.redhat.com/browse/MCO-793)
135138

136139
##### Projected timeline
137140

138-
This is a tentative timeline, subject to change (GA = General Availability, TP = Tech Preview, DEF = Default-on).
141+
This is a tentative timeline, subject to change (GA = General Availability(opt-in), TP = Tech Preview(opt-in), DEF = Default-on(opt-out)).
139142

140143
| Platform | TP | GA | DEF |
141144
| -------- | ------- | ------- | ------- |
142145
| gcp | [4.16](https://docs.redhat.com/en/documentation/openshift_container_platform/4.16/html-single/machine_configuration/index#mco-update-boot-images) |[4.17](https://docs.redhat.com/en/documentation/openshift_container_platform/4.17/html-single/machine_configuration/index#mco-update-boot-images) |4.19 |
143146
| aws | [4.17](https://docs.redhat.com/en/documentation/openshift_container_platform/4.17/html-single/machine_configuration/index#mco-update-boot-images) |[4.18](https://docs.redhat.com/en/documentation/openshift_container_platform/4.18/html-single/machine_configuration/index#mco-update-boot-images) |4.19 |
144147
| vsphere | 4.20 |4.21 |4.22 |
148+
| azure | 4.20 |4.21 |4.22 |
145149
| baremetal| |4.22 |4.23 |
146150
| openstack| |4.22 |4.23 |
147151
| nutanix | |4.23 |4.24 |
@@ -383,6 +387,77 @@ spec:
383387
name: "cluster"
384388
namespace: "default"
385389
```
390+
391+
Alongside the implementation of default-on behavior, a Status field for ManagedBootImages is also planned. This would reflect the
392+
current ManagedBootImages configuration and if unspecified, it will represent the current cluster defaults.
393+
```
394+
type MachineConfigurationStatus struct {
395+
...
396+
...
397+
398+
// managedBootImagesStatus reflects what the latest cluster-validated boot image configuration is
399+
// and will be used by Machine Config Controller while performing boot image updates.
400+
// +openshift:enable:FeatureGate=ManagedBootImages
401+
// +optional
402+
ManagedBootImagesStatus ManagedBootImages `json:"managedBootImagesStatus"`
403+
}
404+
405+
```
406+
Here are some examples to illustrate how this works.
407+
408+
Scenario: No admin configuration and the currrent release **does not** opt-in by default:
409+
```
410+
apiVersion: operator.openshift.io/v1
411+
kind: MachineConfiguration
412+
spec:
413+
status:
414+
managedBootImagesStatus:
415+
machineManagers:
416+
- resource: machinesets
417+
apiGroup: machine.openshift.io
418+
selection:
419+
mode: None
420+
```
421+
Scenario: No admin configuration and the currrent release **does** opt-in by default:
422+
```
423+
apiVersion: operator.openshift.io/v1
424+
kind: MachineConfiguration
425+
spec:
426+
status:
427+
managedBootImagesStatus:
428+
machineManagers:
429+
- resource: machinesets
430+
apiGroup: machine.openshift.io
431+
selection:
432+
mode: All
433+
```
434+
Regardless of the default-on behavior of the release, if the admin were to add a configuration, the status must reflect that in the next update.
435+
```
436+
apiVersion: operator.openshift.io/v1
437+
kind: MachineConfiguration
438+
spec:
439+
managedBootImages:
440+
machineManagers:
441+
- resource: machinesets
442+
apiGroup: machine.openshift.io
443+
selection:
444+
mode: Partial
445+
partial:
446+
machineResourceSelector:
447+
matchLabels: {}
448+
status:
449+
managedBootImagesStatus:
450+
machineManagers:
451+
- resource: machinesets
452+
apiGroup: machine.openshift.io
453+
selection:
454+
mode: Partial
455+
partial:
456+
machineResourceSelector:
457+
matchLabels: {}
458+
```
459+
460+
386461
#### Skew Enforcement
387462
As mentioned in the timeline section, this would only be implemented after default-on behavior has been deemed to be stable across
388463
all platforms.
@@ -393,9 +468,10 @@ type ManagedBootImages struct {
393468
...
394469
...
395470
// skewEnforcement allows an admin to set behavior of the boot image skew enforcement mechanism.
396-
// Enabled means that the MCO will degrade and prevent upgrades when the boot image skew is too large.
397-
// Disabled means that the MCO will no longer degrade and will permit upgrades when the boot image skew is
398-
// too large. This may also hinder the cluster's scaling ability.
471+
// Enabled means that the MCO will degrade and prevent upgrades when the boot image skew exceeds the
472+
// skew limit described by the release image.
473+
// Disabled means that the MCO will no longer degrade and will permit upgrades when the boot image
474+
// exceeds the skew limit described by the release image. This will likely hinder the cluster's scaling ability.
399475
// +optional
400476
SkewEnforcement SkewEnforcementSelectorMode `json:"skewEnforcement"`
401477
}
@@ -535,7 +611,7 @@ MachineSet Reconciliation Loop:
535611
```mermaid
536612
flowchart-elk TD;
537613
Start((Start)) -->MachineSetOwnerCheck[Does the MachineSet have an OwnerReference?]
538-
MachineSetOwnerCheck -->|Yes|Error
614+
MachineSetOwnerCheck -->|Yes|Stop
539615
MachineSetOwnerCheck -->|No| ConfigMapCheck[Has the coreos-bootimages ConfigMap been stamped by the MCO?] ;
540616
541617
ConfigMapCheck -->|Yes|ArchType[Determine arch type of MachineSet, for eg: x86_64, aarch64] ;
@@ -548,9 +624,11 @@ flowchart-elk TD;
548624
subgraph PlatformSpecific[Platform Specific]
549625
ProviderSpec -->IgnitionCheck[Is stub Ignition referenced in ProviderSpec in spec 3 format?] ;
550626
IgnitionCheck -->|Yes|CompareBootImage[Compare bootimage in ProviderSpec against the coreos-bootimage ConfigMap] ;
627+
IgnitionCheck -->|No| IgnitionUpgrade[Attempt Ignition Upgrade];
628+
IgnitionUpgrade -->|Ignition Upgrade Successful| CompareBootImage;
551629
end
552630
553-
IgnitionCheck -->|No| Error[Throw an error to the cluster admin];
631+
IgnitionUpgrade -->|Ignition Upgrade Failed| Error[Throw an error to the cluster admin];
554632
Error -->Stop[Stop];
555633
CompareBootImage -->|Mismatch| Patch[Patch MachineSet];
556634
CompareBootImage -->|Match| Stop[Stop];
@@ -590,8 +668,9 @@ flowchart-elk LR;
590668
```
591669
Some points to note:
592670
- For bookkeeping purposes, the MCO will annotate the `MachineConfiguration` object when opting in the cluster by default.
593-
- If the cluster admin wishes to opt-out of the feature, they have to do so by removing the boot image configuration or explicitly opting out the cluster via the API knob. Due to the presence of the "default opted-in" annotation, the MCO will not attempt to opt-in the cluster by default again.
594671
- This mechanism will be active on installs and upgrades.
672+
- If the cluster admin wishes to opt-out of the feature, they have to do so by explicitly opting out the cluster via the API knob prior to the upgrade.
673+
- If any of the MachineSets have an OwnerReference, it will be skipped for boot image updates. This will cause an alert/warning to the cluster admin, but it will no longer cause a degrade.
595674

596675

597676
### Enforcement of bootimage skew
@@ -607,19 +686,15 @@ The release payload will describe the current skew policy. The structure of this
607686
Some combination of the following mechanisms should be implemented to alert users, particularly non-machineset backed scaled environments. The options generally fall under proactive enforcement (require users to either update or acknowledge risk before upgrading to a new version) vs. reactive enforcement (only fail when a non-compliant bootimage is being used to scale into the cluster).
608687

609688
#### Proactive
610-
Introduce a new configmap in the MCO namespace that will store the last updated boot image and allows for easy comparison against the
611-
skew policy described in the release payload.
689+
Add a new field in the `coreos-bootimages` configmap in the MCO namespace that will store the cluster's current boot image and allows for easy comparison against the skew policy described in the release payload.
612690
- For machineset backed clusters, this would be updated by the MSBIC after it succesfully updates boot images.
613691
- For non-machineset backed clusters, this would be updated by the cluster admin to indicate the last manually updated bootimage. The cluster admin would need to update this configmap every few releases, when the RHEL minor on which the RHCOS container is built on changes (e.g. 9.6->9.8).
614692

615693
The cluster admin may also choose to opt-out of skew management via this configmap, which indicates that they will not require scaling nodes, and thereby opting out of skew enforcement and scaling functionality.
616694

617695
A potential problem here is that the way boot images are stored in the machineset is lossy. In certain platforms, there is no way to recover the boot image metadata from the MachineSet. This is most likely to happen the first time the MCO attempts to do skew enforcement on a cluster that has never had boot image updates. In such cases, the MCO will default to the install time boot image, which can be recovered from the [aleph version](https://github.com/coreos/coreos-assembler/pull/768) of the control plane nodes.
618696

619-
This configmap can then be monitored to enforce skew limits. This could be done in a couple of ways:
620-
- **via the MCO**: If the skew is determined to be too large, the MCO can update its `ClusterOperator` object with an `Upgradeable=False` condition, along with remediation steps in the `Condition` message. This will signal to the CVO that the cluster is not suitable for an upgrade. The drawback of this approach is that the MCO is not able to signal *prior* to the start of a cluster upgrade, so if an incoming upgrade has a "stricter" skew policy, this could break scaling until the admin takes the remediation steps during the upgrade or after the upgrade is complete. This may present as strange UX to the user.
621-
622-
- **via the CVO**: If the CVO is able to do the configmap monitoring, the enforcement can be a bit more proactive. The CVO could then potentially block an incoming upgrade based on the skew policy described in the new release payload, until the remediation steps have been done.
697+
This configmap can then be monitored to enforce skew limits. This could be done in a couple of ways. If the skew is determined to be too large, the MCO can update its `ClusterOperator` object with an `Upgradeable=False` condition, along with remediation steps in the `Condition` message. This will signal to the CVO that the cluster is not suitable for an upgrade.
623698

624699
As stated earlier, to remediate, the cluster admin would then have to do one of the following:
625700
- Turn on boot image updates if it is a machineset backed cluster.

0 commit comments

Comments
 (0)