Skip to content

Percentage Based Sharding for Rulers #6652

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wilguo opened this issue Mar 17, 2025 · 1 comment · Fixed by #6680
Closed

Percentage Based Sharding for Rulers #6652

wilguo opened this issue Mar 17, 2025 · 1 comment · Fixed by #6680
Labels
component/rules Bits & bobs todo with rules and alerts: the ruler, config service etc. type/feature

Comments

@wilguo
Copy link
Contributor

wilguo commented Mar 17, 2025

Is your feature request related to a problem? Please describe.

As of today, the ruler shard size allocated to a tenant stays constant even when we scale up the number of ruler pods. This means that even when the number of total ruler pods increase, the number of rulers evaluating a tenant's rules will not increase. It would be great if more ruler pods can evaluate a tenant's rules when ruler pods increase and vice-versa. This should decrease work on the Cortex operator to have to manually adjust the ruler shard size when rulers scale up and down.

Describe the solution you'd like

Since ruler shard size can also be overriden as a tenant configuration by setting ruler_tenant_shard_size, I'm proposing that we add support for the value to be a percentage as well. We can do this by allowing ruler_tenant_shard_size to be defined as a float or number.

[ruler_tenant_shard_size: <float> | <int> | default = 0]
  • If ruler_tenant_shard_size is < 1, it'll be treated as a percentage.
  • If ruler_tenant_shard_size >= 1, it'll be treated as a constant number

For example, if ruler_tenant_shard_size is set to 0.3, then that means the tenant will always have 30% of the ruler pods evaluating their rules. If the ruler_tenant_shard_size is set to 50, then that means at most 50 pods will evaluate the tenant's rules.

Describe alternatives you've considered

Another proposal that I've considered is also allowing ruler_tenant_shard_size to support a string value as well. The string value will indicate whether the tenant wants a small, medium or large shard size. We can also add new Cortex configurations to allow operators to set their desired percentage for each size.

[ruler_tenant_shard_size: <float> | <int> | <string> | default = 0]

# New Configurations
[ruler_tenant_small_shard_size: <int> | default = 20]
[ruler_tenant_medium_shard_size: <int> | default = 40]
[ruler_tenant_large_shard_size <int> | default = 60]

The benefit of this approach is that depending on the size of the tenant, the Cortex operator can still define shard sizes as a percentage, but allow more percentage of ruler pods to evalute rules for a larger tenant with more rules and have less ruler pods evaluting rules for a smaller tenant with less rules. In the event that a tenant starts to create more rules, we can use the ruler_tenant_shard_size override for the tenant to change them from a "small" to a "medium" for from a "medium" to a "large".

Introducing three new Cortex configuration parameters (ruler_tenant_small_shard_size, ruler_tenant_medium_shard_size, ruler_tenant_large_shard_size) will also give flexibility to the Cortex operator to set percentages for shard sizes they deem appropriate for their workload. If no values are set for these parameters, then we will also provide default values to be used.

In the example configuration below we show how to set ruler_tenant_shard_size to different sizes as well as how to set percentages for small, medium and large.

ruler_tenant_shard_size: "small"
ruler_tenant_shard_size: "medium"
ruler_tenant_shard_size: "large"

ruler_tenant_small_shard_size: 30   # 30% of ruler pods will evaluate small tenants
ruler_tenant_medium_shard_size: 50  # 50% of ruler pods will evaluate medium tenants
ruler_tenant_large_shard_size: 70   # 70% of ruler pods will evaluate large tenants

This additional support for ruler_tenant_shard_size to be defined as string values as well will give Cortex operators the ultimate flexibility in terms of how they want to shard size to be defined.

  • As a constant (eg. ruler_tenant_shard_size = 50)
  • As a fixed percentage (eg. ruler_tenant_shard_size = 0.4)
  • As a set of percentages that can change (eg. ruler_tenant_shard_size = "medium")

The drawback that I see with this approach is that it might be overloading the ruler_tenant_shard_size as a parameter to be able to support three different types of values.

Additional context

There was previous work done for introducing support for dynamic shard size values (#5374). We can use this to implement percentage based shard sizes for rulers.

@dosubot dosubot bot added component/rules Bits & bobs todo with rules and alerts: the ruler, config service etc. type/feature labels Mar 17, 2025
@rajagopalanand
Copy link
Contributor

+1 for fixed shard size as a percentage

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
component/rules Bits & bobs todo with rules and alerts: the ruler, config service etc. type/feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants