Validate compute requests with the assumption that all APIs are maxed out

#### Description

When deploying a new API, we only validate against the scheduled workloads on the cluster. We're not taking into consideration the situation where all of the existing APIs are scaled up to their max number of replicas.

What this can lead to is a situation where APIs can't scale up from their existing number of replicas because the cluster was overcommitted.

#### Solution

We can also provide an additional flag `--no-replica-guarantee` to the deploy command to allow an API to not be guaranteed. OTAH, APIs that don't use the flag will be guaranteed their max number of replicas. We can achieve that by using priority classes https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass.

This doesn't lead the compute resources to be wasted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Validate compute requests with the assumption that all APIs are maxed out #1964

Description

Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Validate compute requests with the assumption that all APIs are maxed out #1964

Description

Description

Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions