-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Mutable dual-stack VIPs #1463
Mutable dual-stack VIPs #1463
Conversation
/cc @cybertron Can you do the first pass please? There is one part I'm not completely sure about, you will notice by a bunch of |
/cc @JoelSpeed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. A few comments on specific details, but in principle this is what we've discussed previously.
|
||
* Modifying virtual IP configuration after second pair of VIPs has been added. Right now we only want to cover "create" operation. Rest of CRUD operations will be covered in the future depending on the needs. | ||
|
||
* Configuration of any VIPs beyond a second pair for the second IP stack. MetalLB is a better solution for creating arbitrary loadbalancers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1000. This is not a feature to turn the on-prem loadbalancer into a generic LB service.
|
||
### Machine Config Operator | ||
|
||
[Keepalived templates managed by the Machine Config Operator](https://github.com/openshift/machine-config-operator/blob/2b31b5b58f0d7d9fe6c3e331e4b3f01c9a1bd00c/templates/common/on-prem/files/keepalived.yaml#L46-L49) currently use the `PlatformStatus` fields. We will need to change the code inside the controller which [pulls this value](https://github.com/openshift/machine-config-operator/blob/2b31b5b58f0d7d9fe6c3e331e4b3f01c9a1bd00c/pkg/controller/template/render.go#L551-L575) so that new `PlatformSpec` path is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we'll be ignoring platformStatus for these fields after this change? I suspect we're going to be asked to sync the spec and status fields. Maybe we just want CNO to synchronize the changes in spec to status?
We had discussed having $SOMETHING update the status only after the changes are complete, but I have to admit I'm not sure how that will work when CNO is responsible for updating Infrastructure but MCO is responsible for actually rolling out the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea was that CNO becomes responsible and CNO syncs status
with spec
after user modifies the Infrastructure CR. If the change passed validations and is rolled-out, CNO would set the status
to what you want. If the change is rejected, it would not.
Then of course you get the discrepancy between "CNO applied the changes" and "MCO rebooted node and now changes are live" but it makes me wonder... Today it's installer setting the status
so if for any reason something went wrong with keepalived initial deployment, can you also have a status not reflecting the live config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote it explicitly that today PlatformStatus is not synced from any real-time runtime monitoring and that I wish to keep it that way. It's not ideal but the elegant way to do it would be to have a controller that monitors keepalived at runtime, but this would be a lot of work that is probably not something we want to do now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. It's a good point that platformStatus isn't necessarily indicative that a given configuration has been applied today either.
|
||
The original enhancement for dual-stack VIPs introduced an idea of creating a separate instance of keepalived for the second IP stack. It was mentioned as something with simpler code changes but following this path would mean we can have two fundamentally different architectures of OpenShift clusters running in the field. | ||
|
||
Because we want clusters with dual-stack VIPs to not differ when they were installed like this from converted clusters, this is probably not the best idea. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. And since it's already possible to deploy a second VIP as a second keepalived instance[0] we wouldn't need an entire new feature for that (although the existing mechanism has drawbacks because you need to populate some image references manually). That approach was found wanting by users which is why dual stack VIPs were properly integrated in the first place.
6fe96a0
to
8853dae
Compare
|
||
1. Cluster Network Operator picks the modification of the object and compares values with `spec.platformStatus.[*].apiServerInternalIPs` and `spec.platformStatus.[*].ingressIPs`. | ||
|
||
1. After validating that the requested change is valid (i.e. conforms with the goals and non-goals), the change is propagated down to the keepalived template and configuration file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides conforming with the goals and non-goals, it might be worth validating if the VIP is on the same CIDR of one of the machines networks to ensure the cluster is dual-stack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say the VIPs added this way need to pass all the same validations as the ones passed to the installer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, in the ideal world we would have a library that implements those validations so that we don't have code duplication in Cluster Network Operator and Installer but at this moment now we should simply make sure we validate the same stuff in both components
|
||
1. Administrator of a dual-stack cluster with single-stack VIPs wants to add a second pair for the second IP stack configured. | ||
|
||
1. Administrator edits the Infrastructure CR named `cluster` by changing the `spec.platformSpec.[*].apiServerInternalIPs` and `spec.platformSpec.[*].ingressIPs` fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you details how the administrator would edit it? how it would look after the pair is added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the how is an implementation detail that we document in the user-facing documentation, like we do for dual-stack conversion today - https://docs.openshift.com/container-platform/4.13/networking/ovn_kubernetes_network_provider/converting-to-dual-stack.html. Whether you use oc
, kubectl
, curl
does not really matter for the feature.
I added the sample platformSpec after the change for clarity
|
||
1. After validating that the requested change is valid (i.e. conforms with the goals and non-goals), the change is propagated down to the keepalived template and configuration file. | ||
|
||
1. After keepalived configuration is changed, the service is restarted or reloaded to apply the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth listing which service it is.
nit: the number of steps is always listed as 1 from line 66-74
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the markdown is rendered the numbering will be right. This is the syntax for numbered lists in that format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's keepalived all in this sentence so I let Ben assess this from language perspective, but I feel like "After keepalived configuration is changed, keepalived is restarted" sounds strange
|
||
## Proposal | ||
|
||
### Workflow Description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the following might vary for each platform, but is there any expectation that the user will open the traffic for the required ports for ingress and api?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no any additional requirement for traffic if you move from one pair of VIPs to two pair of VIPs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, although if the VIP is only being added for internal use it may not be strictly necessary. We should probably document this the same as we do for installation. I'm not 100% sure what that looks like today though.
|
||
Currently the controller already reconciles on the `CreateEvent` and `UpdateEvent` so we will implement a set of functions that compare and validate `platformSpec` with `platformStatus` and do what's needed. | ||
|
||
### Machine Config Operator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will MCO be the component responsible for setting the updated PlatformStatus
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CNO will be responsible for syncing the spec and status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tell me if I get the question wrong, but from #1463 (comment)
- Installer sets the initial values at install-time
- CNO sets the values if changed as cluster lifecycle operation
MCO does not own any CR here, it is only responsible for applying the change on the filesystem (i.e. templating the config file)
### API Extensions | ||
|
||
New fields will have to be added to the platform spec section for [baremetal](https://github.com/openshift/api/blob/938af62eda38e539488d6193e7af292e47a09a5e/config/v1/types_infrastructure.go#L694), [vsphere](https://github.com/openshift/api/blob/938af62eda38e539488d6193e7af292e47a09a5e/config/v1/types_infrastructure.go#L1089) | ||
and [OpenStack](https://github.com/openshift/api/blob/938af62eda38e539488d6193e7af292e47a09a5e/config/v1/types_infrastructure.go#L772). In the first implementation only baremetal will be covered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with starting on baremetal only, but we probably need a followup story/epic to track adding the other platforms that support dual stack.
3dac3a2
to
4b62aa8
Compare
I see a new |
4b62aa8
to
899df67
Compare
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale This is de facto implemented with changes in o/api already merged and CNO under development. At this point we only need to merge this enhancement for traceability purposes but the "discussion phase" has closed |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
/remove-lifecycle stale Already implemented |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
/remove-lifecycle rotten It's not rotten, it's not stale. It has been implemented and the feature is going to 4.16 customers |
/lgtm @danwinship Would you be able to approve this? The feature is already implemented and has been signed off by all of the interested parties, we just never actually merged this. |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@cybertron needs re-LGTM; changed indentation of sections to match updated template |
@mkowalski: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
/lgtm |
No description provided.