Skip to content

Validate compute requests with the assumption that all APIs are maxed out #1964

Open
@RobertLucian

Description

@RobertLucian

Description

When deploying a new API, we only validate against the scheduled workloads on the cluster. We're not taking into consideration the situation where all of the existing APIs are scaled up to their max number of replicas.

What this can lead to is a situation where APIs can't scale up from their existing number of replicas because the cluster was overcommitted.

Solution

We can also provide an additional flag --no-replica-guarantee to the deploy command to allow an API to not be guaranteed. OTAH, APIs that don't use the flag will be guaranteed their max number of replicas. We can achieve that by using priority classes https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass.

This doesn't lead the compute resources to be wasted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestprovisioningSomething related to cluster provisioninguxUser experience

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions