You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When --pssh-pool-size is smaller than the cluster size the list of hosts is split in multiple batches (see https://github.com/thelastpickle/cassandra-medusa/blob/master/medusa/orchestration.py#L57). The problem is that the list of hosts sent to pssh as host_args is not split (the list of old hosts taken from --host-list). So when a cluster of 12 nodes is restored to a different cluster with the same number of nodes with the pool size of 3 the first 3 nodes will get the correct data from the first 3 old nodes in the host list. But the next 3 will get the same data, from the same first 3 nodes (token ranges and SSTables). The end result is quite strange, Cassandra 4 will actually start on all 12 nodes, with errors in logs, and with nodetool status reporting only the first 3 nodes (but running on all nodes).
The solution is simple, split the host lists too. I'll create a PR without tests today, and try to see if I can add tests too.
┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: MED-96
The text was updated successfully, but these errors were encountered:
Project board link
When
--pssh-pool-size
is smaller than the cluster size the list of hosts is split in multiple batches (see https://github.com/thelastpickle/cassandra-medusa/blob/master/medusa/orchestration.py#L57). The problem is that the list of hosts sent to pssh ashost_args
is not split (the list of old hosts taken from--host-list
). So when a cluster of 12 nodes is restored to a different cluster with the same number of nodes with the pool size of 3 the first 3 nodes will get the correct data from the first 3 old nodes in the host list. But the next 3 will get the same data, from the same first 3 nodes (token ranges and SSTables). The end result is quite strange, Cassandra 4 will actually start on all 12 nodes, with errors in logs, and withnodetool status
reporting only the first 3 nodes (but running on all nodes).The solution is simple, split the host lists too. I'll create a PR without tests today, and try to see if I can add tests too.
┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: MED-96
The text was updated successfully, but these errors were encountered: