RFC-0075: Faster Failover and Configuration Push #123

avsej · 2023-06-14T14:21:34Z

No description provided.

rfc/0075-faster-failover-and-configuration-push.md

…r-and-configuration-push * origin/master: Update links in 0048-sdk3-bootstrapping.md

* clarify bootstrap changes * explain SnappyEverywhere name

rfc/0075-faster-failover-and-configuration-push.md

chvck · 2023-07-04T16:13:20Z

rfc/0075-faster-failover-and-configuration-push.md

+
+## Feature Checklist
+
+1. `GetClusterConfigWithKnownVersion` (`0x1d`). The SDK should always supply current configuration version if the


In the case that we're using known versions and cccp polling, if the node we've polled returns success but no body should we consider that a successful poll and retry again after the poll period or should we try the next node to see if it has a newer config?

rfc/0075-faster-failover-and-configuration-push.md

programmatix · 2023-07-05T16:42:18Z

rfc/0075-faster-failover-and-configuration-push.md

+   `epoch`/`revision` pair.
+   Once configuration applied, the SDK must check if new configuration routes the operation to new endpoint or new
+   vbucket on the old endpoint, and *immediately* dispatch operation to new endpoint (or same endpoint in case vbucketID
+   has changed). In any other case, the SDK should send operation to retry orchestrator.


I'm not sure I get this part. Say the SDK is doing N concurrent operations per node. With dedup, just one of those operations will get a NVMB with a config - the other N-1 will get a NVMB with an empty body. Why is there a bunch of special case logic for the single operation? It would be a lot simpler to implement in the JVM clients if we could asynchronously apply the NVMB config body (all config updating is async in JVM world), and send all requests into the RetryOrchestrator but on a tight retry - like a few milliseconds. That would be pretty easy to implement and those few milliseconds wouldn't seem to impact fast failover at all.

(There is a tiny risk that the config wouldn't have been applied before those few milliseconds elapsed. But this would just generate a bit of extra traffic and a few more NVMBs.)

@programmatix In that case, how would you provide determinism that the config had been applies before you attempted to execute the operation again?

I haven't pathfound in the JVM clients yet (David's assigned but asked me to look at it while he's away), but IIRC the config mechanism is an async pub-sub. Something pushes a config and all subscribers will, unless something has gone tragically wrong, get that config in nanoseconds. So I'm proposing - and I may want to revise this after pathfinding - permitting an alternative setup that's technically non-deterministic, but will be in practice in basically all cases: send the new config to the config publisher, and scheduled the operation to be retried in X milliseconds. It should be easy to implement, and it's much less risk than introducing a new blocking mechanism to the config publisher that will be challenging to test and will only trigger during rebalances. And if something odd happens and the config publisher hasn't published by X milliseconds, then all that happens is operations are retried, we get another NVMB, and go around again. That would be some more traffic to the cluster - but to stress, this is an in-memory pub-sub stream, and it should always have delivered the config within those X millis.

I agree with Graham's commentary. "Immediately retry now" isn't feasible for the JVM SDKs.

rfc/0075-faster-failover-and-configuration-push.md

dnault · 2023-07-18T17:24:46Z

rfc/0075-faster-failover-and-configuration-push.md

+| .NET        | Jeffry Morris  |              |          |
+| C/C++       | Sergey Avseyev |              |          |
+| Go          | Charles Dixon  |              |          |
+| Java/Kotlin | David Nault    |              |          |


rfc/0075-faster-failover-and-configuration-push.md

jeffrymorris

A couple of questions regarding the packet layout for GET_CONFIG and CLUSTERMAP_CHANGE_NOTIFICATION. Looks like typos, but is confusing to me.

rfc/0075-faster-failover-and-configuration-push.md

Co-authored-by: David Nault <dnault@users.noreply.github.com>

ingenthr · 2024-06-03T08:21:01Z

Hey @avsej, @brett19, @dnault, et. al.… looks like this is a bit stuck? What do we need to resolve to get them unstuck?

avsej force-pushed the 0075-faster-failover-and-configuration-push branch 8 times, most recently from fe812c0 to 5978c09 Compare June 20, 2023 11:58

avsej force-pushed the 0075-faster-failover-and-configuration-push branch 2 times, most recently from 0ffe096 to 6840048 Compare June 21, 2023 14:49

RFC-0075: initial draft

6914a45

avsej force-pushed the 0075-faster-failover-and-configuration-push branch from 6840048 to 6914a45 Compare June 21, 2023 14:50

chvck reviewed Jun 26, 2023

View reviewed changes

rfc/0075-faster-failover-and-configuration-push.md Show resolved Hide resolved

chvck reviewed Jun 26, 2023

View reviewed changes

rfc/0075-faster-failover-and-configuration-push.md Show resolved Hide resolved

chvck reviewed Jun 26, 2023

View reviewed changes

rfc/0075-faster-failover-and-configuration-push.md Outdated Show resolved Hide resolved

brett19 reviewed Jun 26, 2023

View reviewed changes

avsej added 4 commits June 27, 2023 14:43

Merge remote-tracking branch 'origin/master' into 0075-faster-failove…

3a2465e

…r-and-configuration-push * origin/master: Update links in 0048-sdk3-bootstrapping.md

Update

41b63ec

* clarify bootstrap changes * explain SnappyEverywhere name

note about polling when brief push is not available

0a4590b

clarify client-side deduplication in check list

1a3cfe3

avsej requested review from chvck and brett19 June 27, 2023 12:37

programmatix reviewed Jun 27, 2023

View reviewed changes

rfc/0075-faster-failover-and-configuration-push.md Show resolved Hide resolved

avsej added 2 commits June 27, 2023 19:53

highlight that requests for configs with old revisions should be ignored

92753ec

add clarification on how KV engine deduplicates config payloads

41538e6

dnault reviewed Jun 27, 2023

View reviewed changes

rfc/0075-faster-failover-and-configuration-push.md Outdated Show resolved Hide resolved

dnault reviewed Jun 27, 2023

View reviewed changes

rfc/0075-faster-failover-and-configuration-push.md Show resolved Hide resolved

update NotMyVbucket workflow, fixed typos

af58f5a

avsej force-pushed the 0075-faster-failover-and-configuration-push branch from 49e926d to af58f5a Compare June 27, 2023 19:52

Cover mixed mode case and push workflow

7883de4

chvck reviewed Jul 4, 2023

View reviewed changes

chvck reviewed Jul 5, 2023

View reviewed changes

rfc/0075-faster-failover-and-configuration-push.md Show resolved Hide resolved

programmatix reviewed Jul 5, 2023

View reviewed changes

rfc/0075-faster-failover-and-configuration-push.md Show resolved Hide resolved

note about errors from HELLO operation

019b243

programmatix reviewed Jul 10, 2023

View reviewed changes

rfc/0075-faster-failover-and-configuration-push.md Show resolved Hide resolved

avsej added 2 commits July 10, 2023 15:23

clarify HELLO error handling

040406a

note about OoO and NMV deduplication

123bb43

chvck reviewed Jul 11, 2023

View reviewed changes

rfc/0075-faster-failover-and-configuration-push.md Outdated Show resolved Hide resolved

dnault reviewed Jul 12, 2023

View reviewed changes

rfc/0075-faster-failover-and-configuration-push.md Outdated Show resolved Hide resolved

avsej added 2 commits July 13, 2023 14:03

fix point 3 in checklist

c08136a

update encoding diagrams

9dc0ad3

dnault reviewed Jul 18, 2023

View reviewed changes

dnault previously approved these changes Jul 18, 2023

View reviewed changes

dnault reviewed Aug 1, 2023

View reviewed changes

rfc/0075-faster-failover-and-configuration-push.md Outdated Show resolved Hide resolved

jeffrymorris reviewed Aug 3, 2023

View reviewed changes

rfc/0075-faster-failover-and-configuration-push.md Show resolved Hide resolved

jeffrymorris reviewed Aug 3, 2023

View reviewed changes

rfc/0075-faster-failover-and-configuration-push.md Outdated Show resolved Hide resolved

jeffrymorris reviewed Aug 3, 2023

View reviewed changes

rfc/0075-faster-failover-and-configuration-push.md Outdated Show resolved Hide resolved

jeffrymorris reviewed Aug 3, 2023

View reviewed changes

rfc/0075-faster-failover-and-configuration-push.md Show resolved Hide resolved

jeffrymorris requested changes Aug 3, 2023

View reviewed changes

rfc/0075-faster-failover-and-configuration-push.md Show resolved Hide resolved

typos in packet layout

1270f30

avsej dismissed dnault’s stale review via 1270f30 August 4, 2023 16:26

Update rfc/0075-faster-failover-and-configuration-push.md

05dcb0e

Co-authored-by: David Nault <dnault@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC-0075: Faster Failover and Configuration Push #123

RFC-0075: Faster Failover and Configuration Push #123

avsej commented Jun 14, 2023

chvck Jul 4, 2023

programmatix Jul 5, 2023 •

edited

Loading

brett19 Jul 5, 2023 •

edited

Loading

programmatix Jul 6, 2023

dnault Jul 18, 2023

dnault Jul 18, 2023

jeffrymorris left a comment

ingenthr commented Jun 3, 2024


		## Feature Checklist

		1. `GetClusterConfigWithKnownVersion` (`0x1d`). The SDK should always supply current configuration version if the

	\| Java/Kotlin \| David Nault \| \| \|
	\| Java/Kotlin \| David Nault \| 2023-07-18 \| #1 \|

RFC-0075: Faster Failover and Configuration Push #123

Are you sure you want to change the base?

RFC-0075: Faster Failover and Configuration Push #123

Conversation

avsej commented Jun 14, 2023

chvck Jul 4, 2023

Choose a reason for hiding this comment

programmatix Jul 5, 2023 • edited Loading

Choose a reason for hiding this comment

brett19 Jul 5, 2023 • edited Loading

Choose a reason for hiding this comment

programmatix Jul 6, 2023

Choose a reason for hiding this comment

dnault Jul 18, 2023

Choose a reason for hiding this comment

dnault Jul 18, 2023

Choose a reason for hiding this comment

jeffrymorris left a comment

Choose a reason for hiding this comment

ingenthr commented Jun 3, 2024

programmatix Jul 5, 2023 •

edited

Loading

brett19 Jul 5, 2023 •

edited

Loading