-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
SDN-4035: IPAM for VMs for OVN Kubernetes secondary networks #1456
SDN-4035: IPAM for VMs for OVN Kubernetes secondary networks #1456
Conversation
@maiqueb: This pull request references SDN-4035 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
8e38aef
to
ffb0c12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice,
Thanks
Few nits
pod the VM runs on - the following requirements must be met: | ||
|
||
- VM specification should feature a particular annotation - e.g. | ||
`persistentips.cni.cncf.io`. This **cannot** be updated on an existing VM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Even if the VM is shutdown ?
maybe worth to mention it if so
what about the following scenarios ?
VM is created as stopped and then is updated ? shouldn't we support it ? is there any reason not to ?
asking because the magic is done only when VMI is started from the template isn't it ?
(or VM is created as stopped, started and stopped, and then updated)
- Maybe worth to mention that once there is the annotation it can't be removed (i assume that it can as long as the VM wasn't started but it is a corner case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VMs cannot be modified I think.
I think the allowPersistentIPs
flag at NAD is enough and the VM annotatin is not needed, since the VM creation user (which can be different from the NAD creation user) explicitly choose the NAD at the VM spec so this user already know that this is going ot be a persistent IPs VM.
I doubt the NAD creation user will go an remove the persistenIPs flag, maybe that what we have to throw the error on ?
Is the same with other NAD attribtues, is like saying that user has to put the NAD subnet as a VM annotation to ensure the NAD is as expected.
Also a user may want a pair of networks one with persistentIPs and a different one without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a way in the VM controller to know if it needs to decorate the pod template w/ the required data.
Thus, I think this annotation is helpful - since we don't currently read the NAD contents anywhere in the KubeVirt code base.
@oshoval you're right - I think this is only applied when the VM is running. But still - I think this is a tiny detail we can kick to the curb and focus on the implementation stage.
Probably the right thing to do is to actually change KubeVirt's API, adding to the MultusNetwork object if it wants to have persistent IPs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we have to talk directly about ovn-k + VMs since this is the target of the enhancement instead of CNI IPAM plugin and "encapsulating" object.
The persistentIPs metadata.name
generation is missing.
Also describing how ovn-k controller and CNI will collaborate will help view how the cni args are processed.
```mermaid | ||
sequenceDiagram | ||
|
||
actor user | ||
|
||
user ->> KubeVirt: startVM(vm=vmA) | ||
|
||
KubeVirt ->> Kubernetes: createPod(<br/>name=<generated name>,<br/>owner=vmA,<br/>ips=vmA.status.ips<br/>) | ||
|
||
Kubernetes -->> KubeVirt: OK | ||
|
||
note over OVN-K: notices a new pod was added | ||
|
||
|
||
KubeVirt -->> user: OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can achive the same using ovn-k ippool cache and even without that the created persistentIP is already at the informers cache so in reallity the access to api server to read persistentIPs is not going to happend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the implementation we will - of course - rely on informers to have the data available. That's not negotiable.
I just don't see a better way to represent the IPAM module needs certain data from the K8S datastore to be available to us.
ffb0c12
to
7efe1bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Please take another look.
|
||
KubeVirt ->> Kubernetes: createPod(name=<generated name>, owner=vmA) | ||
|
||
note over OVN-K: notices a new pod was added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
informers "notice <entity> was added. They don't have / see CNI args.
At most they'll have access to annotations on the pod.
I'll consider adding more detail to the note though.
The main difference from a traditional IPAM CNI response to a pod delete is the | ||
`PersistentIP` allocation will not be deleted if the CNI DEL arguments | ||
feature any owner indication. Thus, on the VM scenario, the `PersistentIP`s | ||
will **not** be deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that what I'm saying ?
Thus, on the VM scenario, the
PersistentIP
s will not be deleted.
the
PersistentIP
allocation will not be deleted if the CNI DEL arguments feature any owner indication
What exactly do you want to see here ?
/cc @dougbtv please keep me honest regarding the whereabouts alternative. |
@maiqueb: This pull request references SDN-4035 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@maiqueb: This pull request references SDN-4035 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
7efe1bf
to
6ad5071
Compare
@maiqueb: This pull request references SDN-4035 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, focused on this commit
0d55450
to
1aeff1c
Compare
f78492f
to
30da6e0
Compare
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworked the document after @jcaamano 's review.
`layer3` topologies (secondary networks) since those topologies feature a | ||
subnet per node. | ||
|
||
Furthermore, performance wise, implementing the "sticky IPs" feature in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point; shouting performance without providing any figures is not a good practice.
But here, honestly, I think given the synchronous way whereabouts works it will definitely clog these networks (if we demand scale from them).
Right now, we do not have performance figures to meet AFAIK.
#### Why this alternative was not considered further | ||
|
||
Implementing the sticky IPs feature in separate IPAM plugins will prevent this | ||
solution from being used on the default cluster network for the HyperShift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I'll re-write this to focus on the performance, while leaving this argument in.
Implementing a new CNI IPAM plugin - or refactoring whereabouts to address its | ||
shortcomings - would be a major effort. | ||
|
||
### Separate IPAMClaims controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did consider somehow computing the IP address from the VM's namespace + name (kind of hashing it), but discarded since we believe we couldn't come up with a solution that would have no collisions.
Meaning we envision other plugins with IPAM functionality to use this CRD to | ||
provide persistent IPs for their workloads. This proposal is captured in the | ||
following | ||
[de-facto standard update proposal](https://docs.google.com/document/d/1HjPZCfl_3zsXr905T620mF7OFt2QnwxLZomPbeDj0vc). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this proposal tied to a review/approval process?
Yes.
Is that process completion a requirement for this enhancement?
I would rather keep it less strict, and would depend instead on @dougbtv 's sign-off .
But we can try to do that, and re-evaluate if some unforeseen issues are raised.
3. OVN-K reacts to the creation of the `IPAMClaim`s. It will generate IP address | ||
from the required IP pools, and update the corresponding `IPAMClaim` with the | ||
generate IPs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the requested information to the doc. Thanks for raising.
Let's say for example that OVNK is not the only component present in the cluster that can honor IPClaims. OVNK would know it has to honor it when it receives a pod request with an IPClaim on one of the networks it handles but not before that.
I agree with this flow - but I think this is already described in the document. Do you have any suggestion on how I could somehow highlight it ?
3. OVN-K reacts to the creation of the `IPAMClaim`s. It will allocate IP | ||
addresses from the required IP pools, and **only afterward** update the | ||
corresponding `IPAMClaim` with the generated IPs. Users can only rely / use | ||
the `IPAMClaim` status for informational purposes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth mentioning here that OVN-K does not react to the creation of the IPAMClaim
, it rather reacts to a pod request that features a ipam-claim-reference
pointing to that IPAMClaim for one of the networks it is managing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Is this OK now @jcaamano ?
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
#1555 is changing the enhancement template in a way that will cause the header check in the linter job to fail for existing PRs. If this PR is merged within the development period for 4.16 you may override the linter if the only failures are caused by issues with the headers (please make sure the markdown formatting is correct). If this PR is not merged before 4.16 development closes, please update the enhancement to conform to the new template. |
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
/lgtm |
- the NAD holding the configuration for the secondary network must allow for | ||
persistent IPs - `{..., "allowPersistentIPs": true, ...}` | ||
|
||
If the KubeVirt feature gate is enabled, KubeVirt will create the `IPAMClaims` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPAMClaim object proposal accepted upstream on Feb 8th 2024
allocations, it is probable the time it takes to provision a pod to grow. | ||
|
||
Implementing a new CNI IPAM plugin - or refactoring whereabouts to address its | ||
shortcomings - would be a major effort. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I think IPAMClaim could be useful for it, but, it is non-trivial.
- Reinventing the IPAM wheel | ||
- Productifying a new IPAM solution takes time | ||
- Shipping / release engineering a new component | ||
- We're basically implementing IPAM outside CNI. Clunky flow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is a good argument related to KNI as well. Like, implementing in CNI is difficult, and then implementing it outside CNI is clunky. If it's closer to the kubernetes side than it is the CNI side, I think the development and maintenance story is better than it is in CNI.
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dougbtv The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
42f3df1
to
ec2638f
Compare
@maiqueb: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm |
This proposal focuses on tying IP address allocation to KubeVirt Virtual Machines instead of the pods where they run.
In short, the IP allocation should be available during the Virtual Machine life-cycle, rather than the life-cycle of the pod where it runs on.