Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

restore-cluster (to different cluster) fails when pssh pool size is smaller than the cluster size #803

Open
serban21 opened this issue Sep 16, 2024 · 0 comments

Comments

@serban21
Copy link

serban21 commented Sep 16, 2024

Project board link

When --pssh-pool-size is smaller than the cluster size the list of hosts is split in multiple batches (see https://github.com/thelastpickle/cassandra-medusa/blob/master/medusa/orchestration.py#L57). The problem is that the list of hosts sent to pssh as host_args is not split (the list of old hosts taken from --host-list). So when a cluster of 12 nodes is restored to a different cluster with the same number of nodes with the pool size of 3 the first 3 nodes will get the correct data from the first 3 old nodes in the host list. But the next 3 will get the same data, from the same first 3 nodes (token ranges and SSTables). The end result is quite strange, Cassandra 4 will actually start on all 12 nodes, with errors in logs, and with nodetool status reporting only the first 3 nodes (but running on all nodes).

The solution is simple, split the host lists too. I'll create a PR without tests today, and try to see if I can add tests too.

┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: MED-96

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant