Description
Description
When deploying a new API, we only validate against the scheduled workloads on the cluster. We're not taking into consideration the situation where all of the existing APIs are scaled up to their max number of replicas.
What this can lead to is a situation where APIs can't scale up from their existing number of replicas because the cluster was overcommitted.
Solution
We can also provide an additional flag --no-replica-guarantee
to the deploy command to allow an API to not be guaranteed. OTAH, APIs that don't use the flag will be guaranteed their max number of replicas. We can achieve that by using priority classes https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass.
This doesn't lead the compute resources to be wasted.