Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Behaviour when all redis nodes go down simultaneously. #152

Open
rahulKrishnaM opened this issue Mar 15, 2023 · 2 comments
Open

Behaviour when all redis nodes go down simultaneously. #152

rahulKrishnaM opened this issue Mar 15, 2023 · 2 comments

Comments

@rahulKrishnaM
Copy link

rahulKrishnaM commented Mar 15, 2023

Hi @bjosv,

Let me explain the scenario. We have a set of redis pods(say 3 master shards, each having 1 replica associated) behind a kubernetes Service IP. So initially, while connecting to hiredis-cluster we pass the Service IP as the target IP. hiredis-cluster then discovers the set of ips behind the service later by means of 'cluster nodes' command and updates its internal map to have all the new ips (3 master ips) as reachable ips.

But let's say at some point in time all the 3 master ips (and its replicas as well) went down simultaneously. I wanted to understand what the recovery mechanism is in such a scenario. Since hiredis-cluster doesn't have the k8s service Ip stored, it won't be able to relearn the new IPs (after reboot) if my understanding is correct.

Is reinitialising the hiredis-cluster the only way forward?

@bjosv
Copy link
Collaborator

bjosv commented Mar 15, 2023

Interesting, as of now you assessment that a re-initialization is needed in this scenario is correct.
The initially given addresses are replaced with the knowledge (addresses) from the Redis cluster.

Is this triggered by tests or is it a likely scenario with the Redis operator you are using?

What do you believe is the best way to handle this scenario?
Should the initially added nodes be kept and used as a last resort when attempting to get the slot information from the cluster.
..or should there be an additional API for this?

@rahulKrishnaM
Copy link
Author

Ya as part of some resiliency tests, this particular scenario was attempted.

I was thinking storing the k8s fqdn/ip which is initially passed to hiredis-cluster in the cache would be the best option. Since service is not bound to change at any point (for power failure sort of scenarios), if we tag that to the current cluster discovery mechanism, we could fallback to the k8s service if all the previously learned nodes return an error or something like that.

@zuiderkwast zuiderkwast changed the title Behaviour when all redis master shards go down simultaneously. Behaviour when all redis nodes go down simultaneously. Mar 16, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants