-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
MVP support for serving workloads running as LeaderWorkerSet #3232
Comments
/assign @vladikkuzn |
I synced with @mbobrovskyi and @vladikkuzn on the feature and it seems complex, so I propose to have a KEP for it, and go via the Alpha phase so that we can update the implementation in the future easily. |
The already identified follow ups needed after #3515:
|
/reopen |
@mimowo: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Also, I would like to have e2e test for scaling the LWS when Startup policy is Leader ready (default). |
What would you like to be added:
MVP support for LeaderWorkerSet in Kueue. It does not need to be ideal, but we want to have some support to unblock users and collect users' feedback.
The idea is to base the support on StatefulSets, so the integration would also use Pod Groups, similarly as for regular StatefulSets. Each LeaderWorkerGroup creates a new Pod Group. I a single pod group we will have:
The size of the group will be taken from LeaderWorkerSet.Spec.LeaderWorkerTemplate.Size and increased by 1 (to include the leader).
This is a follow up to #2717.
Why is this needed:
We want to support serving primitives in Kueue as there is an increasing demand among users to run clusters mixing AI training and inference who want to manage the expensive GPU resources.
LeaderWorkerSet is a new serving API which is gaining popularity as a primitive to host AI/ML inference.
The text was updated successfully, but these errors were encountered: