Percentage Based Sharding for Rulers

### Is your feature request related to a problem? Please describe.

As of today, the ruler shard size allocated to a tenant stays constant even when we scale up the number of ruler pods. This means that even when the number of total ruler pods increase, the number of rulers evaluating a tenant's rules will not increase. It would be great if more ruler pods can evaluate a tenant's rules when ruler pods increase and vice-versa. This should decrease work on the Cortex operator to have to manually adjust the ruler shard size when rulers scale up and down.

### Describe the solution you'd like

Since ruler shard size can also be overriden as a tenant configuration by setting `ruler_tenant_shard_size`, I'm proposing that we add support for the value to be a percentage as well. We can do this by allowing `ruler_tenant_shard_size` to be defined as a float or number.

```
[ruler_tenant_shard_size: <float> | <int> | default = 0]
```

* If `ruler_tenant_shard_size` is < 1, it'll be treated as a percentage.
* If `ruler_tenant_shard_size` >= 1, it'll be treated as a constant number

For example, if `ruler_tenant_shard_size` is set to 0.3, then that means the tenant will always have 30% of the ruler pods evaluating their rules. If the `ruler_tenant_shard_size` is set to 50, then that means at most 50 pods will evaluate the tenant's rules.

### Describe alternatives you've considered

Another proposal that I've considered is also allowing `ruler_tenant_shard_size` to support a string value as well. The string value will indicate whether the tenant wants a small, medium or large shard size. We can also add new Cortex configurations to allow operators to set their desired percentage for each size.

```
[ruler_tenant_shard_size: <float> | <int> | <string> | default = 0]

# New Configurations
[ruler_tenant_small_shard_size: <int> | default = 20]
[ruler_tenant_medium_shard_size: <int> | default = 40]
[ruler_tenant_large_shard_size <int> | default = 60]
```

The benefit of this approach is that depending on the size of the tenant, the Cortex operator can still define shard sizes as a percentage, but allow more percentage of ruler pods to evalute rules for a larger tenant with more rules and have less ruler pods evaluting rules for a smaller tenant with less rules. In the event that a tenant starts to create more rules, we can use the `ruler_tenant_shard_size` override for the tenant to change them from a `"small"` to a `"medium"` for from a `"medium"` to a `"large"`. 

Introducing three new Cortex configuration parameters (`ruler_tenant_small_shard_size`, `ruler_tenant_medium_shard_size`, `ruler_tenant_large_shard_size`) will also give flexibility to the Cortex operator to set percentages for shard sizes they deem appropriate for their workload. If no values are set for these parameters, then we will also provide default values to be used.

In the example configuration below we show how to set `ruler_tenant_shard_size` to different sizes as well as how to set percentages for small, medium and large.

```
ruler_tenant_shard_size: "small"
ruler_tenant_shard_size: "medium"
ruler_tenant_shard_size: "large"

ruler_tenant_small_shard_size: 30   # 30% of ruler pods will evaluate small tenants
ruler_tenant_medium_shard_size: 50  # 50% of ruler pods will evaluate medium tenants
ruler_tenant_large_shard_size: 70   # 70% of ruler pods will evaluate large tenants
```

This additional support for `ruler_tenant_shard_size` to be defined as string values as well will give Cortex operators the ultimate flexibility in terms of how they want to shard size to be defined. 
* As a constant (eg. `ruler_tenant_shard_size = 50`)
* As a fixed percentage (eg. `ruler_tenant_shard_size = 0.4`)
* As a set of percentages that can change (eg. `ruler_tenant_shard_size = "medium"`)

The drawback that I see with this approach is that it might be overloading the `ruler_tenant_shard_size` as a parameter to be able to support three different types of values.

### Additional context

There was previous work done for introducing support for dynamic shard size values (https://github.com/cortexproject/cortex/issues/5374). We can use this to implement percentage based shard sizes for rulers.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Percentage Based Sharding for Rulers #6652

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Percentage Based Sharding for Rulers #6652

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions