-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Raft Leader doesn't come back up after application restart #607
Comments
👋 Thanks for opening this issue! Get help or engage by:
|
It's caused by incorrect handling of the initial state when starting up. Let me fix it! |
A node should not set `server_state` to `Leader` when just starting up, even when it's the only voter in a cluster. It still needs several step to initialize leader related fields to become a leader. - Fix: databendlabs#607
I fixed this issue in: With this patch, run the following script in 😎 And thank you for reporting this corner case! #!/bin/sh
set -o errexit
rpc() {
local uri=$1
local body="$2"
echo '---'" rpc(:$uri, $body)"
{
if [ ".$body" = "." ]; then
time curl --silent "127.0.0.1:$uri"
else
time curl --silent "127.0.0.1:$uri" -H "Content-Type: application/json" -d "$body"
fi
} | {
if type jq > /dev/null 2>&1; then
jq
else
cat
fi
}
echo
echo
}
export RUST_LOG=info
LOG=info
# cd ./example/raft-kv-rocksdb
cargo build
./target/debug/raft-key-value-rocks --id 1 --http-addr 127.0.0.1:21001 --rpc-addr 127.0.0.1:22001 &
pid=$!
sleep 1
rpc 21001/cluster/init '{}'
sleep 1
rpc 21001/api/write '{"Set":{"key":"foo","value":"foo"}}'
rpc 21001/api/read '"foo"'
echo expect '"foo"'
sleep 1
echo kill pid:$pid
kill $pid
sleep 1
./target/debug/raft-key-value-rocks --id 1 --http-addr 127.0.0.1:21001 --rpc-addr 127.0.0.1:22001 &
pid=$!
sleep 1
rpc 21001/api/write '{"Set":{"key":"foo","value":"new_value"}}'
rpc 21001/api/read '"foo"'
echo expect '"new_value"'
echo kill pid:$pid
kill $pid
|
Hi, I still sometimes encounter similar error when restarting single node cluster. (Using version from commit hash 4332722). According to panic backtrace it happens during handling of During debugging I also saw some strange values Current leader node is set to running node but its state is Follower it looks like some kind of inconsistency. Is it expected? |
No it is not expected. :( Is there a log file at debug level when this happened? And what do you do to reproduce it? |
I have custom storage based on sledstore (Modified ExampleRequest to have Set and Remove values). More or less steps to reproduce were to initialize, shutdown, start and call client_write. I'll investigate it further tomorrow to get requested logs and create simpler example where it happens |
I reproduced it. Because when a node just started up, it needs a little time to initialize the leader data. Let me fix it! |
… another round of election - Test: single-node restart test does not expect the node to run election any more. - Refactor: add VoteHandler to handle vote related operations. - Change: make ServerState default value `Learner`. - Fix: databendlabs#607
This bug will be fixed in:
With this patch, run the following script in ./example/raft-kv-rocksdb to check if the bug is gone:
|
Can confirm it fixed my case. Thanks for such quick response and fix. |
Describe the bug
If you create a single node cluster and restart the application the new instance will panic with:
'thread 'tokio-runtime-worker' panicked at 'internal error: entered unreachable code: it has to be a leader!!!', /mnt/c/dev/openraft/openraft/src/core/raft_core.rs:1041:13'
It's been a while since I played with Openraft so I don't remember if this is the default/expected behavior.
If this is the expected behavior how can I avoid this crash and let the node come back up after a restart ?
To Reproduce
Steps to reproduce the behavior (using the Rocksdb example):
Expected behavior
The node come back up without a problem and resume from where it was interrupted.
Actual behavior
The application panics.
Env (please complete the following information):
Additional files:
Rocksdb example logs:
My application logs (where I first noticed the problem):
The text was updated successfully, but these errors were encountered: