Handle transient queue deletion in Khepri minority (backport #11979) (backport #11990) #11991
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Transient queue deletion previously caused a crash if Khepri was enabled and a node with a transient queue went down while its cluster was in a minority. We need to handle the
{error,timeout}
return possible fromrabbit_db_queue:delete_transient/1
. In therabbit_amqqueue:on_node_down/1
callback we log a warning when we see this return.We then try this deletion again during that node's
rabbit_khepri:init/0
which is called from a boot step afterrabbit_khepri:setup/0
. At that point we can return an error and halt the node's boot if the command times out. The cluster is very likely to be in a majority at that point sincerabbit_khepri:setup/0
waits for a leader to be elected (requiring a majority).This fixes a crash report found in the
cluster_minority_SUITE
'send_per_group
.This is an automatic backport of pull request #11979 done by [Mergify](https://mergify.com).
This is an automatic backport of pull request #11990 done by [Mergify](https://mergify.com).