Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

SDN-4035: IPAM for VMs for OVN Kubernetes secondary networks #1456

Merged

Conversation

maiqueb
Copy link
Contributor

@maiqueb maiqueb commented Aug 16, 2023

This proposal focuses on tying IP address allocation to KubeVirt Virtual Machines instead of the pods where they run.

In short, the IP allocation should be available during the Virtual Machine life-cycle, rather than the life-cycle of the pod where it runs on.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 16, 2023
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 16, 2023

@maiqueb: This pull request references SDN-4035 which is a valid jira issue.

In response to this:

TODO

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from abhat and danwinship August 16, 2023 15:02
@maiqueb maiqueb force-pushed the ipam-sdn-secondary-networks branch 7 times, most recently from 8e38aef to ffb0c12 Compare August 22, 2023 08:16
Copy link

@oshoval oshoval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice,
Thanks
Few nits

pod the VM runs on - the following requirements must be met:

- VM specification should feature a particular annotation - e.g.
`persistentips.cni.cncf.io`. This **cannot** be updated on an existing VM.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Even if the VM is shutdown ?
    maybe worth to mention it if so

what about the following scenarios ?
VM is created as stopped and then is updated ? shouldn't we support it ? is there any reason not to ?
asking because the magic is done only when VMI is started from the template isn't it ?
(or VM is created as stopped, started and stopped, and then updated)

  1. Maybe worth to mention that once there is the annotation it can't be removed (i assume that it can as long as the VM wasn't started but it is a corner case)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VMs cannot be modified I think.

I think the allowPersistentIPs flag at NAD is enough and the VM annotatin is not needed, since the VM creation user (which can be different from the NAD creation user) explicitly choose the NAD at the VM spec so this user already know that this is going ot be a persistent IPs VM.

I doubt the NAD creation user will go an remove the persistenIPs flag, maybe that what we have to throw the error on ?

Is the same with other NAD attribtues, is like saying that user has to put the NAD subnet as a VM annotation to ensure the NAD is as expected.

Also a user may want a pair of networks one with persistentIPs and a different one without it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a way in the VM controller to know if it needs to decorate the pod template w/ the required data.

Thus, I think this annotation is helpful - since we don't currently read the NAD contents anywhere in the KubeVirt code base.

@oshoval you're right - I think this is only applied when the VM is running. But still - I think this is a tiny detail we can kick to the curb and focus on the implementation stage.

Probably the right thing to do is to actually change KubeVirt's API, adding to the MultusNetwork object if it wants to have persistent IPs.

Copy link
Contributor

@qinqon qinqon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we have to talk directly about ovn-k + VMs since this is the target of the enhancement instead of CNI IPAM plugin and "encapsulating" object.

The persistentIPs metadata.name generation is missing.

Also describing how ovn-k controller and CNI will collaborate will help view how the cni args are processed.

Comment on lines 379 to 394
```mermaid
sequenceDiagram

actor user

user ->> KubeVirt: startVM(vm=vmA)

KubeVirt ->> Kubernetes: createPod(<br/>name=<generated name>,<br/>owner=vmA,<br/>ips=vmA.status.ips<br/>)

Kubernetes -->> KubeVirt: OK

note over OVN-K: notices a new pod was added


KubeVirt -->> user: OK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can achive the same using ovn-k ippool cache and even without that the created persistentIP is already at the informers cache so in reallity the access to api server to read persistentIPs is not going to happend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the implementation we will - of course - rely on informers to have the data available. That's not negotiable.

I just don't see a better way to represent the IPAM module needs certain data from the K8S datastore to be available to us.

@maiqueb maiqueb force-pushed the ipam-sdn-secondary-networks branch from ffb0c12 to 7efe1bf Compare August 25, 2023 12:22
Copy link
Contributor Author

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Please take another look.


KubeVirt ->> Kubernetes: createPod(name=<generated name>, owner=vmA)

note over OVN-K: notices a new pod was added
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

informers "notice <entity> was added. They don't have / see CNI args.

At most they'll have access to annotations on the pod.

I'll consider adding more detail to the note though.

Comment on lines 127 to 136
The main difference from a traditional IPAM CNI response to a pod delete is the
`PersistentIP` allocation will not be deleted if the CNI DEL arguments
feature any owner indication. Thus, on the VM scenario, the `PersistentIP`s
will **not** be deleted.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that what I'm saying ?

Thus, on the VM scenario, the PersistentIPs will not be deleted.

the PersistentIP allocation will not be deleted if the CNI DEL arguments feature any owner indication

What exactly do you want to see here ?

@maiqueb
Copy link
Contributor Author

maiqueb commented Aug 25, 2023

/cc @dougbtv please keep me honest regarding the whereabouts alternative.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 25, 2023

@maiqueb: This pull request references SDN-4035 which is a valid jira issue.

In response to this:

TODO:

  • Openshift Virt section
  • ...

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 28, 2023

@maiqueb: This pull request references SDN-4035 which is a valid jira issue.

In response to this:

TODO:

  • Openshift Virt section
  • Add information on how the PersistentIP CRD will be integrated into the OVN-Kubernetes IP pools
  • Add a section about Kubernetes GC
  • Polish the user configuration part
  • Polish the controller work-flow (do we really need to use the status sub-resource ?... If not, what do we gain by it ?)

If we want to use the status sub-resource, the flow is bound to change a bit from what is described in the solution.

Probably what we would want to do is: KubeVirt provisions the PersistentIP (w/ spec); the OVN-K plugin will re-act to it by choosing an IP from the pool, and updating its status; the pod creation must hold until the IP address is available in the status.

The flow described above does sound more complex.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@maiqueb maiqueb force-pushed the ipam-sdn-secondary-networks branch from 7efe1bf to 6ad5071 Compare August 29, 2023 14:57
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 29, 2023

@maiqueb: This pull request references SDN-4035 which is a valid jira issue.

In response to this:

TODO:

  • Openshift Virt section
  • Add information on how the PersistentIP CRD will be integrated into the OVN-Kubernetes IP pools
    - Add a section about Kubernetes GC
  • Polish the user configuration part
  • Polish the controller work-flow (do we really need to use the status sub-resource ?... If not, what do we gain by it ?)

If we want to use the status sub-resource, the flow is bound to change a bit from what is described in the solution.

Probably what we would want to do is: KubeVirt provisions the PersistentIP (w/ spec); the OVN-K plugin will re-act to it by choosing an IP from the pool, and updating its status; the pod creation must hold until the IP address is available in the status.

The flow described above does sound more complex.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@maiqueb
Copy link
Contributor Author

maiqueb commented Aug 29, 2023

@oshoval please review 6ad5071

I've kept the info there to a bare minimum of what I think is required to understand the problem, and the "solution".

Let me hear your thoughts.

Copy link

@oshoval oshoval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, focused on this commit

@maiqueb maiqueb force-pushed the ipam-sdn-secondary-networks branch 2 times, most recently from 0d55450 to 1aeff1c Compare August 30, 2023 11:47
@maiqueb
Copy link
Contributor Author

maiqueb commented Aug 30, 2023

@qinqon please review f78492f

which adds the information about how to integrate the CRD with the existing OVN-K IPPool.

@maiqueb maiqueb force-pushed the ipam-sdn-secondary-networks branch from f78492f to 30da6e0 Compare August 30, 2023 15:05
Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
Copy link
Contributor Author

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked the document after @jcaamano 's review.

`layer3` topologies (secondary networks) since those topologies feature a
subnet per node.

Furthermore, performance wise, implementing the "sticky IPs" feature in the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point; shouting performance without providing any figures is not a good practice.

But here, honestly, I think given the synchronous way whereabouts works it will definitely clog these networks (if we demand scale from them).

Right now, we do not have performance figures to meet AFAIK.

#### Why this alternative was not considered further

Implementing the sticky IPs feature in separate IPAM plugins will prevent this
solution from being used on the default cluster network for the HyperShift
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I'll re-write this to focus on the performance, while leaving this argument in.

Implementing a new CNI IPAM plugin - or refactoring whereabouts to address its
shortcomings - would be a major effort.

### Separate IPAMClaims controller
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did consider somehow computing the IP address from the VM's namespace + name (kind of hashing it), but discarded since we believe we couldn't come up with a solution that would have no collisions.

Meaning we envision other plugins with IPAM functionality to use this CRD to
provide persistent IPs for their workloads. This proposal is captured in the
following
[de-facto standard update proposal](https://docs.google.com/document/d/1HjPZCfl_3zsXr905T620mF7OFt2QnwxLZomPbeDj0vc).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this proposal tied to a review/approval process?

Yes.

Is that process completion a requirement for this enhancement?

I would rather keep it less strict, and would depend instead on @dougbtv 's sign-off .

But we can try to do that, and re-evaluate if some unforeseen issues are raised.

Comment on lines 107 to 109
3. OVN-K reacts to the creation of the `IPAMClaim`s. It will generate IP address
from the required IP pools, and update the corresponding `IPAMClaim` with the
generate IPs.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the requested information to the doc. Thanks for raising.

Let's say for example that OVNK is not the only component present in the cluster that can honor IPClaims. OVNK would know it has to honor it when it receives a pod request with an IPClaim on one of the networks it handles but not before that.

I agree with this flow - but I think this is already described in the document. Do you have any suggestion on how I could somehow highlight it ?

Comment on lines 108 to 111
3. OVN-K reacts to the creation of the `IPAMClaim`s. It will allocate IP
addresses from the required IP pools, and **only afterward** update the
corresponding `IPAMClaim` with the generated IPs. Users can only rely / use
the `IPAMClaim` status for informational purposes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth mentioning here that OVN-K does not react to the creation of the IPAMClaim, it rather reacts to a pod request that features a ipam-claim-reference pointing to that IPAMClaim for one of the networks it is managing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Is this OK now @jcaamano ?

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
@dhellmann
Copy link
Contributor

#1555 is changing the enhancement template in a way that will cause the header check in the linter job to fail for existing PRs. If this PR is merged within the development period for 4.16 you may override the linter if the only failures are caused by issues with the headers (please make sure the markdown formatting is correct). If this PR is not merged before 4.16 development closes, please update the enhancement to conform to the new template.

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
@jcaamano
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 16, 2024
- the NAD holding the configuration for the secondary network must allow for
persistent IPs - `{..., "allowPersistentIPs": true, ...}`

If the KubeVirt feature gate is enabled, KubeVirt will create the `IPAMClaims`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPAMClaim object proposal accepted upstream on Feb 8th 2024

allocations, it is probable the time it takes to provision a pod to grow.

Implementing a new CNI IPAM plugin - or refactoring whereabouts to address its
shortcomings - would be a major effort.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. I think IPAMClaim could be useful for it, but, it is non-trivial.

- Reinventing the IPAM wheel
- Productifying a new IPAM solution takes time
- Shipping / release engineering a new component
- We're basically implementing IPAM outside CNI. Clunky flow.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is a good argument related to KNI as well. Like, implementing in CNI is difficult, and then implementing it outside CNI is clunky. If it's closer to the kubernetes side than it is the CNI side, I think the development and maintenance story is better than it is in CNI.

@dougbtv
Copy link
Member

dougbtv commented Feb 19, 2024

/lgtm

@dougbtv
Copy link
Member

dougbtv commented Feb 19, 2024

/approve

Copy link
Contributor

openshift-ci bot commented Feb 19, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dougbtv

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed lgtm Indicates that a PR is ready to be merged. labels Feb 19, 2024
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD e7c9a21 and 2 for PR HEAD da247aa in total

Signed-off-by: Miguel Duarte Barroso <mdbarroso@redhat.com>
@maiqueb maiqueb force-pushed the ipam-sdn-secondary-networks branch from 42f3df1 to ec2638f Compare February 19, 2024 14:42
Copy link
Contributor

openshift-ci bot commented Feb 19, 2024

@maiqueb: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jcaamano
Copy link
Contributor

/lgtm

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants