Spread matching partitions across nodes #7717

dnr · 2025-05-06T18:02:04Z

What changed?

Add option to use LookupN instead of Lookup for matching partition routing.

Why?

This can help spread partitions across available nodes better: instead of placing each partition independently, they're always placed on separate nodes, as long as there are enough nodes, up to the batch size.

How did you test it?

added unit test, will use cicd to test with this setting

Potential risks

If this dynamic config is changed on a live cluster, it may result in a period of task queue thrashing and delayed tasks. If changing on a live cluster, try to change on all nodes at the same time.

ShahabT · 2025-05-07T19:40:25Z

client/clientfactory.go

+	if n >= len(hosts) {
+		n %= len(hosts)
 	}


This part could lead to inconsistent hashing when on host goes down/up. Is the assumption that n is comfortably lower than # of host?

The best case is to have #partitions <= #hosts, yeah. But it doesn't have to be...

The question is just, what happens if you have 2 hosts and 4 partitions? You want 2+2, not 3+1 or 4+0. How do you do that? You allocate 0 to host 0, 1 to host 1, then you have to cycle around, 2 to host 0, 3 to host 1. Mod does that.

When hosts go up and down partitions move anyway. I think this would lead to more frequent movement for the higher number partitions (if I understand the ringpop consistent hashing). We could make the batch size smaller to reduce that effect.

ShahabT · 2025-05-07T19:40:34Z

common/dynamicconfig/constants.go

+	MatchingSpreadRoutingBatchSize = NewGlobalIntSetting(
+		"matching.spreadRoutingBatchSize",
+		0,
+		`If non-zero, try to spread task queue partitions across matching nodes better, using the given batch size.`,


It seems the assumption is the batch size will be set to a number that is comfortable lower than the number of matching pods, right? Should we mention that here?

Also should we mention the risk of changing this number after a cluster starts getting traffic?

I was planning to set it to something like 16. It doesn't have to be smaller than the number of pods. It should be at least as big as default partitions (so >= 4).

Spread matching partitions across nodes

4ebbbdd

dnr requested a review from a team as a code owner May 6, 2025 18:02

dnr added 2 commits May 6, 2025 23:23

fix unit test

cda6d3a

tweaks and unit test

7b42233

ShahabT reviewed May 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spread matching partitions across nodes #7717

Spread matching partitions across nodes #7717

Uh oh!

dnr commented May 6, 2025 •

edited

Loading

Uh oh!

ShahabT May 7, 2025

Uh oh!

dnr May 7, 2025

Uh oh!

ShahabT May 7, 2025

Uh oh!

ShahabT May 7, 2025

Uh oh!

dnr May 7, 2025

Uh oh!

Uh oh!

Spread matching partitions across nodes #7717

Are you sure you want to change the base?

Spread matching partitions across nodes #7717

Uh oh!

Conversation

dnr commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed?

Why?

How did you test it?

Potential risks

Uh oh!

ShahabT May 7, 2025

Choose a reason for hiding this comment

Uh oh!

dnr May 7, 2025

Choose a reason for hiding this comment

Uh oh!

ShahabT May 7, 2025

Choose a reason for hiding this comment

Uh oh!

ShahabT May 7, 2025

Choose a reason for hiding this comment

Uh oh!

dnr May 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dnr commented May 6, 2025 •

edited

Loading