Raft Leader doesn't come back up after application restart #607

Gab-Menezes · 2022-12-02T06:04:09Z

Describe the bug
If you create a single node cluster and restart the application the new instance will panic with:
'thread 'tokio-runtime-worker' panicked at 'internal error: entered unreachable code: it has to be a leader!!!', /mnt/c/dev/openraft/openraft/src/core/raft_core.rs:1041:13'
It's been a while since I played with Openraft so I don't remember if this is the default/expected behavior.
If this is the expected behavior how can I avoid this crash and let the node come back up after a restart ?

To Reproduce
Steps to reproduce the behavior (using the Rocksdb example):

RUST_LOG=debug ./target/debug/raft-key-value-rocks --id 1 --http-addr 127.0.0.1:21001 --rpc-addr 127.0.0.1:22001
POST to: http://127.0.0.1:21001/cluster/init
POST to: http://127.0.0.1:21001/api/write | Body: {"Set":{"key":"foo","value":"bar"}}
POST to: http://127.0.0.1:21001/api/read | Body: "foo"
ctrl+c to kill the application
RUST_LOG=debug ./target/debug/raft-key-value-rocks --id 1 --http-addr 127.0.0.1:21001 --rpc-addr 127.0.0.1:22001

Expected behavior
The node come back up without a problem and resume from where it was interrupted.

Actual behavior
The application panics.

Env (please complete the following information):

OS: Ubuntu 22.04
CPU: Ryzen 5600x
Rust-toolchain: cargo 1.65.0-nightly (ce40690a5 2022-08-09)
Openraft version: 2877be0 (aka: main)

Additional files:
Rocksdb example logs:

2022-12-02T05:46:35.280776Z  INFO openraft::core::raft_core: remove all replication
thread 'tokio-runtime-worker' panicked at 'internal error: entered unreachable code: it has to be a leader!!!', /mnt/c/dev/openraft/openraft/src/core/raft_core.rs:1041:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

My application logs (where I first noticed the problem):

2022-12-02T05:28:15.399783Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main: openraft::core::raft_core: raft node is initializing

2022-12-02T05:28:15.399984Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main: openraft::storage::helper: RaftStorage::get_membership membership_in_sm=EffectiveMembership { log_id: Some(LogId { leader_id: LeaderId { term: 1, node_id: 1 }, index: 28 }), membership: Membership { configs: [{1}], nodes: {1: Node { api_addr: "localhost:8081" }} }, all_members: {1} } membership_in_log=[]

2022-12-02T05:28:15.400094Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:calc_server_state: openraft::engine::engine_impl: states is_member=true is_leader=true is_leading=false

2022-12-02T05:28:15.400112Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main: openraft::core::raft_core: update election timeout after: 816ms, can_be_leader: false

2022-12-02T05:28:15.400120Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main: openraft::core::raft_core: id=1 target_state: Leader

2022-12-02T05:28:15.400141Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:report_metrics{replication=Update(None)}: openraft::core::raft_core: report_metrics: Metrics{id:1,Leader, term:1, last_log:Some(69), last_applied:Some(LogId { leader_id: LeaderId { term: 1, node_id: 1 }, index: 69 }), leader:Some(1), membership:{log_id:Some(LogId { leader_id: LeaderId { term: 1, node_id: 1 }, index: 28 }) membership:members:[{1:{Node { api_addr: "localhost:8081" }}}],learners:[]}, snapshot:None, replication:

2022-12-02T05:28:15.419059Z  INFO actix_server::builder: Starting 8 workers

2022-12-02T05:28:15.419069Z  INFO actix_server::server: Tokio runtime found; starting in existing Tokio runtime

2022-12-02T05:28:15.776232Z DEBUG new{id=1 cluster=foo}:tick: openraft::core::tick: Tick sent: 1

2022-12-02T05:28:15.776272Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}: openraft::core::raft_core: recv from rx_api: Tick 1

2022-12-02T05:28:15.776284Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}: openraft::core::raft_core: received tick: 1, now: Instant { tv_sec: 548574, tv_nsec: 439331363 }

2022-12-02T05:28:15.776291Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:leader_step_down: openraft::engine::engine_impl: leader_step_down: node_id:1

2022-12-02T05:28:15.776310Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:leader_step_down: openraft::engine::engine_impl: membership: {log_id:Some(LogId { leader_id: LeaderId { term: 1, node_id: 1 }, index: 28 }) membership:members:[{1:{Node { api_addr: "localhost:8081" }}}],learners:[]}, committed: 1-1-69, is_leading: false

2022-12-02T05:28:15.776323Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:run_engine_commands: openraft::core::raft_core: run command: start...

2022-12-02T05:28:16.152786Z DEBUG new{id=1 cluster=foo}:tick: openraft::core::tick: Tick sent: 2

2022-12-02T05:28:16.152821Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}: openraft::core::raft_core: recv from rx_api: Tick 2

2022-12-02T05:28:16.152834Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}: openraft::core::raft_core: received tick: 2, now: Instant { tv_sec: 548574, tv_nsec: 815880807 }

2022-12-02T05:28:16.152847Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:leader_step_down: openraft::engine::engine_impl: leader_step_down: node_id:1

2022-12-02T05:28:16.152858Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:leader_step_down: openraft::engine::engine_impl: membership: {log_id:Some(LogId { leader_id: LeaderId { term: 1, node_id: 1 }, index: 28 }) membership:members:[{1:{Node { api_addr: "localhost:8081" }}}],learners:[]}, committed: 1-1-69, is_leading: false

2022-12-02T05:28:16.152872Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:run_engine_commands: openraft::core::raft_core: run command: start...

2022-12-02T05:28:16.529333Z DEBUG new{id=1 cluster=foo}:tick: openraft::core::tick: Tick sent: 3

2022-12-02T05:28:16.529360Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}: openraft::core::raft_core: recv from rx_api: Tick 3

2022-12-02T05:28:16.529371Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}: openraft::core::raft_core: received tick: 3, now: Instant { tv_sec: 548575, tv_nsec: 192418357 }

2022-12-02T05:28:16.529384Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:elect: openraft::engine::engine_impl: vote is changing to vote=vote:2-1

2022-12-02T05:28:16.529394Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:elect: openraft::engine::engine_impl: set_server_state id=1 server_state=Leader

2022-12-02T05:28:16.529402Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:elect:update_progress: openraft::engine::engine_impl: update_progress: node_id:1 log_id:Some(LogId { leader_id: LeaderId { term: 2, node_id: 1 }, index: 70 })

2022-12-02T05:28:16.529418Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:elect:update_progress: openraft::engine::engine_impl: leader progress progress=VecProgress { quorum_set: Joint { data: [[1]], _p: PhantomData<(u64, alloc::vec::Vec<u64>)> }, granted: None, voter_count: 1, vector: [(1, ProgressEntry { matching: None, searching: Some(Searching { mid: 35, end: 70 }) })], stat: Stat { update_count: 0, move_count: 0, is_quorum_count: 0 } }

2022-12-02T05:28:16.529433Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:elect:update_progress: openraft::engine::engine_impl: committed after updating progress committed=Some(LogId { leader_id: LeaderId { term: 2, node_id: 1 }, index: 70 })

2022-12-02T05:28:16.529454Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:run_engine_commands: openraft::core::raft_core: run command: start...

2022-12-02T05:28:16.529468Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:run_engine_commands: openraft::core::raft_core: run command: SaveVote { vote: Vote { term: 2, node_id: 1, committed: false } }

2022-12-02T05:28:16.529478Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:run_engine_commands: openraft::core::raft_core: run command: SaveVote { vote: Vote { term: 2, node_id: 1, committed: true } }

2022-12-02T05:28:16.529482Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:run_engine_commands: openraft::core::raft_core: run command: UpdateReplicationStreams { targets: [] }

2022-12-02T05:28:16.529490Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:run_engine_commands: openraft::core::raft_core: run command: AppendBlankLog { log_id: LogId { leader_id: LeaderId { term: 2, node_id: 1 }, index: 70 } }

2022-12-02T05:28:16.529495Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:run_engine_commands: openraft::core::raft_core: run command: ReplicateCommitted { committed: Some(LogId { leader_id: LeaderId { term: 2, node_id: 1 }, index: 70 }) }

2022-12-02T05:28:16.529503Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:run_engine_commands: openraft::core::raft_core: run command: LeaderCommit { already_committed: Some(LogId { leader_id: LeaderId { term: 1, node_id: 1 }, index: 69 }), upto: LogId { leader_id: LeaderId { term: 2, node_id: 1 }, index: 70 } }

2022-12-02T05:28:16.529512Z DEBUG new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:run_engine_commands: openraft::core::raft_core: run command: ReplicateEntries { upto: Some(LogId { leader_id: LeaderId { term: 2, node_id: 1 }, index: 70 }) }

2022-12-02T05:28:16.529592Z  INFO new{id=1 cluster=foo}:RaftCore{id=1 cluster=foo}:main:runtime_loop{id=1}:handle_api_msg{state=Leader id=1}:run_engine_commands:remove_all_replication: openraft::core::raft_core: remove all replication

thread 'tokio-runtime-worker' panicked at 'internal error: entered unreachable code: it has to be a leader!!!', /home/ubuntu/.cargo/git/checkouts/openraft-0677bbb78cc57964/229bb50/openraft/src/core/raft_core.rs:1043:13

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The text was updated successfully, but these errors were encountered:

github-actions · 2022-12-02T06:04:20Z

👋 Thanks for opening this issue!

Get help or engage by:

/help : to print help messages.
/assignme : to assign this issue to you.

drmingdrmer · 2022-12-03T11:55:04Z

It's caused by incorrect handling of the initial state when starting up. Let me fix it!

A node should not set `server_state` to `Leader` when just starting up, even when it's the only voter in a cluster. It still needs several step to initialize leader related fields to become a leader. - Fix: databendlabs#607

drmingdrmer · 2022-12-03T15:37:15Z

I fixed this issue in:

Fix: become non-leader when restarting single node cluster #609

With this patch, run the following script in ./example/raft-kv-rocksdb to check if the bug is gone:

😎 And thank you for reporting this corner case!

#!/bin/sh

set -o errexit

rpc() {
    local uri=$1
    local body="$2"

    echo '---'" rpc(:$uri, $body)"

    {
        if [ ".$body" = "." ]; then
            time curl --silent "127.0.0.1:$uri"
        else
            time curl --silent "127.0.0.1:$uri" -H "Content-Type: application/json" -d "$body"
        fi
    } | {
        if type jq > /dev/null 2>&1; then
            jq
        else
            cat
        fi
    }

    echo
    echo
}

export RUST_LOG=info
LOG=info

# cd ./example/raft-kv-rocksdb

cargo build

./target/debug/raft-key-value-rocks --id 1 --http-addr 127.0.0.1:21001 --rpc-addr 127.0.0.1:22001 &
pid=$!

sleep 1

rpc 21001/cluster/init '{}'

sleep 1

rpc 21001/api/write '{"Set":{"key":"foo","value":"foo"}}'
rpc 21001/api/read  '"foo"'
echo expect '"foo"'

sleep 1

echo kill pid:$pid
kill $pid

sleep 1

./target/debug/raft-key-value-rocks --id 1 --http-addr 127.0.0.1:21001 --rpc-addr 127.0.0.1:22001 &
pid=$!

sleep 1

rpc 21001/api/write '{"Set":{"key":"foo","value":"new_value"}}'
rpc 21001/api/read  '"foo"'
echo expect '"new_value"'

echo kill pid:$pid
kill $pid

k-u-s · 2023-01-04T13:12:46Z

Hi,

I still sometimes encounter similar error when restarting single node cluster. (Using version from commit hash 4332722). According to panic backtrace it happens during handling of Command::ReplicateEntries inside raft_core.rs method run_command. Full logs: it-has-to-be-leader-panic.txt

During debugging I also saw some strange values

Current leader node is set to running node but its state is Follower it looks like some kind of inconsistency. Is it expected?

drmingdrmer · 2023-01-04T13:39:40Z

No it is not expected. :(

Is there a log file at debug level when this happened? And what do you do to reproduce it?

k-u-s · 2023-01-04T14:11:44Z

I have custom storage based on sledstore (Modified ExampleRequest to have Set and Remove values).

More or less steps to reproduce were to initialize, shutdown, start and call client_write.

I'll investigate it further tomorrow to get requested logs and create simpler example where it happens

drmingdrmer · 2023-01-04T14:12:08Z

I reproduced it. Because when a node just started up, it needs a little time to initialize the leader data. Let me fix it!

… another round of election - Test: single-node restart test does not expect the node to run election any more. - Refactor: add VoteHandler to handle vote related operations. - Change: make ServerState default value `Learner`. - Fix: databendlabs#607

drmingdrmer · 2023-01-05T01:22:43Z

This bug will be fixed in:

Change: A restarted leader should enter leader state at once, without another round of election #640

With this patch, run the following script in ./example/raft-kv-rocksdb to check if the bug is gone:

#!/bin/sh

set -o errexit

rpc() {
    local uri=$1
    local body="$2"

    echo '---'" rpc(:$uri, $body)"

    {
        if [ ".$body" = "." ]; then
            time curl --silent "127.0.0.1:$uri"
        else
            time curl --silent "127.0.0.1:$uri" -H "Content-Type: application/json" -d "$body"
        fi
    } | {
        if type jq > /dev/null 2>&1; then
            jq
        else
            cat
        fi
    }

    echo
    echo
}

export RUST_LOG=debug

# cd ./example/raft-kv-rocksdb

cargo build

./target/debug/raft-key-value-rocks --id 1 --http-addr 127.0.0.1:21001 --rpc-addr 127.0.0.1:22001 &
pid=$!

sleep 1

rpc 21001/cluster/init '{}'

sleep 1

rpc 21001/api/write '{"Set":{"key":"foo","value":"foo"}}'
rpc 21001/api/read  '"foo"'
echo expect '"foo"'

sleep 1

echo kill pid:$pid
kill $pid

sleep 1

./target/debug/raft-key-value-rocks --id 1 --http-addr 127.0.0.1:21001 --rpc-addr 127.0.0.1:22001 &
pid=$!

sleep 0.3

rpc 21001/api/write '{"Set":{"key":"foo","value":"new_value"}}'
rpc 21001/api/read  '"foo"'
echo expect '"new_value"'

echo kill pid:$pid
kill $pid

k-u-s · 2023-01-05T08:15:35Z

Can confirm it fixed my case. Thanks for such quick response and fix.

drmingdrmer self-assigned this Dec 3, 2022

drmingdrmer added the C-bug Category: bug label Dec 3, 2022

drmingdrmer mentioned this issue Dec 3, 2022

Fix: become non-leader when restarting single node cluster #609

Merged

drmingdrmer closed this as completed in #609 Dec 3, 2022

drmingdrmer reopened this Jan 4, 2023

drmingdrmer mentioned this issue Jan 4, 2023

Change: A restarted leader should enter leader state at once, without another round of election #640

Merged

drmingdrmer closed this as completed in #640 Jan 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raft Leader doesn't come back up after application restart #607

Raft Leader doesn't come back up after application restart #607

Gab-Menezes commented Dec 2, 2022 •

edited

Loading

github-actions bot commented Dec 2, 2022

drmingdrmer commented Dec 3, 2022

drmingdrmer commented Dec 3, 2022

k-u-s commented Jan 4, 2023

drmingdrmer commented Jan 4, 2023

k-u-s commented Jan 4, 2023

drmingdrmer commented Jan 4, 2023

drmingdrmer commented Jan 5, 2023

k-u-s commented Jan 5, 2023

Raft Leader doesn't come back up after application restart #607

Raft Leader doesn't come back up after application restart #607

Comments

Gab-Menezes commented Dec 2, 2022 • edited Loading

github-actions bot commented Dec 2, 2022

drmingdrmer commented Dec 3, 2022

drmingdrmer commented Dec 3, 2022

k-u-s commented Jan 4, 2023

drmingdrmer commented Jan 4, 2023

k-u-s commented Jan 4, 2023

drmingdrmer commented Jan 4, 2023

drmingdrmer commented Jan 5, 2023

k-u-s commented Jan 5, 2023

Gab-Menezes commented Dec 2, 2022 •

edited

Loading