-
Notifications
You must be signed in to change notification settings - Fork 984
Spread matching partitions across nodes #7717
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: main
Are you sure you want to change the base?
Conversation
if n >= len(hosts) { | ||
n %= len(hosts) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part could lead to inconsistent hashing when on host goes down/up. Is the assumption that n is comfortably lower than # of host?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best case is to have #partitions <= #hosts, yeah. But it doesn't have to be...
The question is just, what happens if you have 2 hosts and 4 partitions? You want 2+2, not 3+1 or 4+0. How do you do that? You allocate 0 to host 0, 1 to host 1, then you have to cycle around, 2 to host 0, 3 to host 1. Mod does that.
When hosts go up and down partitions move anyway. I think this would lead to more frequent movement for the higher number partitions (if I understand the ringpop consistent hashing). We could make the batch size smaller to reduce that effect.
MatchingSpreadRoutingBatchSize = NewGlobalIntSetting( | ||
"matching.spreadRoutingBatchSize", | ||
0, | ||
`If non-zero, try to spread task queue partitions across matching nodes better, using the given batch size.`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the assumption is the batch size will be set to a number that is comfortable lower than the number of matching pods, right? Should we mention that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also should we mention the risk of changing this number after a cluster starts getting traffic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning to set it to something like 16. It doesn't have to be smaller than the number of pods. It should be at least as big as default partitions (so >= 4).
What changed?
Add option to use LookupN instead of Lookup for matching partition routing.
Why?
This can help spread partitions across available nodes better: instead of placing each partition independently, they're always placed on separate nodes, as long as there are enough nodes, up to the batch size.
How did you test it?
added unit test, will use cicd to test with this setting
Potential risks
If this dynamic config is changed on a live cluster, it may result in a period of task queue thrashing and delayed tasks. If changing on a live cluster, try to change on all nodes at the same time.