Skip to content

Spread matching partitions across nodes #7717

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dnr
Copy link
Member

@dnr dnr commented May 6, 2025

What changed?

Add option to use LookupN instead of Lookup for matching partition routing.

Why?

This can help spread partitions across available nodes better: instead of placing each partition independently, they're always placed on separate nodes, as long as there are enough nodes, up to the batch size.

How did you test it?

added unit test, will use cicd to test with this setting

Potential risks

If this dynamic config is changed on a live cluster, it may result in a period of task queue thrashing and delayed tasks. If changing on a live cluster, try to change on all nodes at the same time.

@dnr dnr requested a review from a team as a code owner May 6, 2025 18:02
Comment on lines +226 to 228
if n >= len(hosts) {
n %= len(hosts)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part could lead to inconsistent hashing when on host goes down/up. Is the assumption that n is comfortably lower than # of host?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best case is to have #partitions <= #hosts, yeah. But it doesn't have to be...

The question is just, what happens if you have 2 hosts and 4 partitions? You want 2+2, not 3+1 or 4+0. How do you do that? You allocate 0 to host 0, 1 to host 1, then you have to cycle around, 2 to host 0, 3 to host 1. Mod does that.

When hosts go up and down partitions move anyway. I think this would lead to more frequent movement for the higher number partitions (if I understand the ringpop consistent hashing). We could make the batch size smaller to reduce that effect.

MatchingSpreadRoutingBatchSize = NewGlobalIntSetting(
"matching.spreadRoutingBatchSize",
0,
`If non-zero, try to spread task queue partitions across matching nodes better, using the given batch size.`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the assumption is the batch size will be set to a number that is comfortable lower than the number of matching pods, right? Should we mention that here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should we mention the risk of changing this number after a cluster starts getting traffic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning to set it to something like 16. It doesn't have to be smaller than the number of pods. It should be at least as big as default partitions (so >= 4).

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants