-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
RFC-0075: Faster Failover and Configuration Push #123
base: master
Are you sure you want to change the base?
RFC-0075: Faster Failover and Configuration Push #123
Conversation
fe812c0
to
5978c09
Compare
0ffe096
to
6840048
Compare
6840048
to
6914a45
Compare
…r-and-configuration-push * origin/master: Update links in 0048-sdk3-bootstrapping.md
49e926d
to
af58f5a
Compare
|
||
## Feature Checklist | ||
|
||
1. `GetClusterConfigWithKnownVersion` (`0x1d`). The SDK should always supply current configuration version if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case that we're using known versions and cccp polling, if the node we've polled returns success but no body should we consider that a successful poll and retry again after the poll period or should we try the next node to see if it has a newer config?
`epoch`/`revision` pair. | ||
Once configuration applied, the SDK must check if new configuration routes the operation to new endpoint or new | ||
vbucket on the old endpoint, and *immediately* dispatch operation to new endpoint (or same endpoint in case vbucketID | ||
has changed). In any other case, the SDK should send operation to retry orchestrator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I get this part. Say the SDK is doing N concurrent operations per node. With dedup, just one of those operations will get a NVMB with a config - the other N-1 will get a NVMB with an empty body. Why is there a bunch of special case logic for the single operation? It would be a lot simpler to implement in the JVM clients if we could asynchronously apply the NVMB config body (all config updating is async in JVM world), and send all requests into the RetryOrchestrator but on a tight retry - like a few milliseconds. That would be pretty easy to implement and those few milliseconds wouldn't seem to impact fast failover at all.
(There is a tiny risk that the config wouldn't have been applied before those few milliseconds elapsed. But this would just generate a bit of extra traffic and a few more NVMBs.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@programmatix In that case, how would you provide determinism that the config had been applies before you attempted to execute the operation again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't pathfound in the JVM clients yet (David's assigned but asked me to look at it while he's away), but IIRC the config mechanism is an async pub-sub. Something pushes a config and all subscribers will, unless something has gone tragically wrong, get that config in nanoseconds. So I'm proposing - and I may want to revise this after pathfinding - permitting an alternative setup that's technically non-deterministic, but will be in practice in basically all cases: send the new config to the config publisher, and scheduled the operation to be retried in X milliseconds. It should be easy to implement, and it's much less risk than introducing a new blocking mechanism to the config publisher that will be challenging to test and will only trigger during rebalances. And if something odd happens and the config publisher hasn't published by X milliseconds, then all that happens is operations are retried, we get another NVMB, and go around again. That would be some more traffic to the cluster - but to stress, this is an in-memory pub-sub stream, and it should always have delivered the config within those X millis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Graham's commentary. "Immediately retry now" isn't feasible for the JVM SDKs.
| .NET | Jeffry Morris | | | | ||
| C/C++ | Sergey Avseyev | | | | ||
| Go | Charles Dixon | | | | ||
| Java/Kotlin | David Nault | | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Java/Kotlin | David Nault | | | | |
| Java/Kotlin | David Nault | 2023-07-18 | #1 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of questions regarding the packet layout for GET_CONFIG and CLUSTERMAP_CHANGE_NOTIFICATION. Looks like typos, but is confusing to me.
Co-authored-by: David Nault <dnault@users.noreply.github.com>
No description provided.