You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
As of today, the ruler shard size allocated to a tenant stays constant even when we scale up the number of ruler pods. This means that even when the number of total ruler pods increase, the number of rulers evaluating a tenant's rules will not increase. It would be great if more ruler pods can evaluate a tenant's rules when ruler pods increase and vice-versa. This should decrease work on the Cortex operator to have to manually adjust the ruler shard size when rulers scale up and down.
Describe the solution you'd like
Since ruler shard size can also be overriden as a tenant configuration by setting ruler_tenant_shard_size, I'm proposing that we add support for the value to be a percentage as well. We can do this by allowing ruler_tenant_shard_size to be defined as a float or number.
If ruler_tenant_shard_size is < 1, it'll be treated as a percentage.
If ruler_tenant_shard_size >= 1, it'll be treated as a constant number
For example, if ruler_tenant_shard_size is set to 0.3, then that means the tenant will always have 30% of the ruler pods evaluating their rules. If the ruler_tenant_shard_size is set to 50, then that means at most 50 pods will evaluate the tenant's rules.
Describe alternatives you've considered
Another proposal that I've considered is also allowing ruler_tenant_shard_size to support a string value as well. The string value will indicate whether the tenant wants a small, medium or large shard size. We can also add new Cortex configurations to allow operators to set their desired percentage for each size.
The benefit of this approach is that depending on the size of the tenant, the Cortex operator can still define shard sizes as a percentage, but allow more percentage of ruler pods to evalute rules for a larger tenant with more rules and have less ruler pods evaluting rules for a smaller tenant with less rules. In the event that a tenant starts to create more rules, we can use the ruler_tenant_shard_size override for the tenant to change them from a "small" to a "medium" for from a "medium" to a "large".
Introducing three new Cortex configuration parameters (ruler_tenant_small_shard_size, ruler_tenant_medium_shard_size, ruler_tenant_large_shard_size) will also give flexibility to the Cortex operator to set percentages for shard sizes they deem appropriate for their workload. If no values are set for these parameters, then we will also provide default values to be used.
In the example configuration below we show how to set ruler_tenant_shard_size to different sizes as well as how to set percentages for small, medium and large.
ruler_tenant_shard_size: "small"
ruler_tenant_shard_size: "medium"
ruler_tenant_shard_size: "large"
ruler_tenant_small_shard_size: 30 # 30% of ruler pods will evaluate small tenants
ruler_tenant_medium_shard_size: 50 # 50% of ruler pods will evaluate medium tenants
ruler_tenant_large_shard_size: 70 # 70% of ruler pods will evaluate large tenants
This additional support for ruler_tenant_shard_size to be defined as string values as well will give Cortex operators the ultimate flexibility in terms of how they want to shard size to be defined.
As a constant (eg. ruler_tenant_shard_size = 50)
As a fixed percentage (eg. ruler_tenant_shard_size = 0.4)
As a set of percentages that can change (eg. ruler_tenant_shard_size = "medium")
The drawback that I see with this approach is that it might be overloading the ruler_tenant_shard_size as a parameter to be able to support three different types of values.
Additional context
There was previous work done for introducing support for dynamic shard size values (#5374). We can use this to implement percentage based shard sizes for rulers.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
As of today, the ruler shard size allocated to a tenant stays constant even when we scale up the number of ruler pods. This means that even when the number of total ruler pods increase, the number of rulers evaluating a tenant's rules will not increase. It would be great if more ruler pods can evaluate a tenant's rules when ruler pods increase and vice-versa. This should decrease work on the Cortex operator to have to manually adjust the ruler shard size when rulers scale up and down.
Describe the solution you'd like
Since ruler shard size can also be overriden as a tenant configuration by setting
ruler_tenant_shard_size
, I'm proposing that we add support for the value to be a percentage as well. We can do this by allowingruler_tenant_shard_size
to be defined as a float or number.ruler_tenant_shard_size
is < 1, it'll be treated as a percentage.ruler_tenant_shard_size
>= 1, it'll be treated as a constant numberFor example, if
ruler_tenant_shard_size
is set to 0.3, then that means the tenant will always have 30% of the ruler pods evaluating their rules. If theruler_tenant_shard_size
is set to 50, then that means at most 50 pods will evaluate the tenant's rules.Describe alternatives you've considered
Another proposal that I've considered is also allowing
ruler_tenant_shard_size
to support a string value as well. The string value will indicate whether the tenant wants a small, medium or large shard size. We can also add new Cortex configurations to allow operators to set their desired percentage for each size.The benefit of this approach is that depending on the size of the tenant, the Cortex operator can still define shard sizes as a percentage, but allow more percentage of ruler pods to evalute rules for a larger tenant with more rules and have less ruler pods evaluting rules for a smaller tenant with less rules. In the event that a tenant starts to create more rules, we can use the
ruler_tenant_shard_size
override for the tenant to change them from a"small"
to a"medium"
for from a"medium"
to a"large"
.Introducing three new Cortex configuration parameters (
ruler_tenant_small_shard_size
,ruler_tenant_medium_shard_size
,ruler_tenant_large_shard_size
) will also give flexibility to the Cortex operator to set percentages for shard sizes they deem appropriate for their workload. If no values are set for these parameters, then we will also provide default values to be used.In the example configuration below we show how to set
ruler_tenant_shard_size
to different sizes as well as how to set percentages for small, medium and large.This additional support for
ruler_tenant_shard_size
to be defined as string values as well will give Cortex operators the ultimate flexibility in terms of how they want to shard size to be defined.ruler_tenant_shard_size = 50
)ruler_tenant_shard_size = 0.4
)ruler_tenant_shard_size = "medium"
)The drawback that I see with this approach is that it might be overloading the
ruler_tenant_shard_size
as a parameter to be able to support three different types of values.Additional context
There was previous work done for introducing support for dynamic shard size values (#5374). We can use this to implement percentage based shard sizes for rulers.
The text was updated successfully, but these errors were encountered: