functional-tester: clean up, handle Operation_SIGQUIT_ETCD_AND_REMOVE_DATA #9548

gyuho · 2018-04-09T16:50:22Z

Adding Operation_SIGQUIT_ETCD_AND_REMOVE_DATA for membership reconfiguration tests #9150.

In following PRs, I am adding

// SIGQUIT_AND_REMOVE_ONE_FOLLOWER stops a randomly chosen follower
// (non-leader), deletes its data directories on disk, and removes
// this member from cluster (membership reconfiguration). On recovery,
// tester adds a new member, and this member joins the existing cluster
// with fresh data. It waits "failure-delay-ms" before recovering this
// failure. This simulates destroying one follower machine, where operator
// needs to add a new member from a fresh machine.
// The expected behavior is that a new member joins the existing cluster,
// and then each member continues to process client requests.
SIGQUIT_AND_REMOVE_ONE_FOLLOWER = 10;

// SIGQUIT_AND_REMOVE_ONE_FOLLOWER_UNTIL_TRIGGER_SNAPSHOT stops a randomly
// chosen follower, deletes its data directories on disk, and removes
// this member from cluster (membership reconfiguration). On recovery,
// tester adds a new member, and this member joins the existing cluster
// restart. On member remove, cluster waits until most up-to-date node
// (leader) applies the snapshot count of entries since the stop operation.
// This simulates destroying a leader machine, where operator needs to add
// a new member from a fresh machine.
// The expected behavior is that a new member joins the existing cluster,
// and receives a snapshot from the active leader. As always, after
// recovery, each member must be able to process client requests.
SIGQUIT_AND_REMOVE_ONE_FOLLOWER_UNTIL_TRIGGER_SNAPSHOT = 11;

// SIGQUIT_AND_REMOVE_LEADER stops the active leader node, deletes its
// data directories on disk, and removes this member from cluster.
// On recovery, tester adds a new member, and this member joins the
// existing cluster with fresh data. It waits "failure-delay-ms" before
// recovering this failure. This simulates destroying a leader machine,
// where operator needs to add a new member from a fresh machine.
// The expected behavior is that a new member joins the existing cluster,
// and then each member continues to process client requests.
SIGQUIT_AND_REMOVE_LEADER = 12;

// SIGQUIT_AND_REMOVE_LEADER_UNTIL_TRIGGER_SNAPSHOT stops the active leader,
// deletes its data directories on disk, and removes this member from
// cluster (membership reconfiguration). On recovery, tester adds a new
// member, and this member joins the existing cluster restart. On member
// remove, cluster waits until most up-to-date node (new leader) applies
// the snapshot count of entries since the stop operation. This simulates
// destroying a leader machine, where operator needs to add a new member
// from a fresh machine.
// The expected behavior is that on member remove, cluster elects a new
// leader, and a new member joins the existing cluster and receives a
// snapshot from the newly elected leader. As always, after recovery, each
// member must be able to process client requests.
SIGQUIT_AND_REMOVE_LEADER_UNTIL_TRIGGER_SNAPSHOT = 13;

// SIGQUIT_AND_REMOVE_QUORUM_AND_ALL first stops majority number of nodes,
// deletes data directories on disks, to make the whole cluster inoperable.
// Then tester can not even remove stopped members since quorum is lost.
// Let's assume 3-node cluster of node A, B, and C. One day, node A and B
// are destroyed and all their data are gone. The only viable solution is
// to recover from C's latest snapshot. When node A and B become
// unavailable, tester also kills the last node C, creates a single-node
// cluster from scratch, and adds more nodes to establish a multi-node
// cluster.
// The expected behavior is that etcd successfully recovers from such
// disastrous situation as only 1-node survives out of 3-node cluster,
// new members joins the existing cluster, and previous data from snapshot
// are still preserved after recovery process. As always, after recovery,
// each member must be able to process client requests.
SIGQUIT_AND_REMOVE_QUORUM_AND_ALL = 14;

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

For later "runner" cleanup Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

gyuho added 7 commits April 7, 2018 10:00

*: rename, clean up functional tests

85e050a

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

CHANGELOG-3.4: update

c57a70c

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

pkg/fileutil: test "Exist" on directory

b8bf42c

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

functional: move "etcd-test-proxy"

cd4580b

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

functional: rename to "SIGTERM/SIGQUIT*"

bc1fd92

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

functional/tester: handle static certs

ef594eb

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

functional/agent: handle static TLS certs

cafa3b9

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

gyuho added the area/functional-testing label Apr 9, 2018

gyuho added 2 commits April 9, 2018 10:16

functional/tester: clean up "broadcastOperation"

0e60915

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

functional/tester: fix shadowed "err" variable

c16e411

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

gyuho merged commit 10a51a3 into etcd-io:master Apr 9, 2018

gyuho deleted the functional-tester branch April 9, 2018 17:20

gyuho added 7 commits April 9, 2018 10:22

functional/tester: add "printReport"

0547211

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

functional/tester: clean up runner logging

2922116

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

functional/tester: handle "process already finished"

7facfde

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

functional/tester: improve logging

ecadb0f

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

functional/rpcpb: document FailureCase

68adc6e

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

functional/tester: delay after injecting "kill" to trigger election

d8a2d3a

Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

pkg/stringutil: add tests

f53c153

For later "runner" cleanup Signed-off-by: Gyuho Lee <gyuhox@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

functional-tester: clean up, handle Operation_SIGQUIT_ETCD_AND_REMOVE_DATA #9548

functional-tester: clean up, handle Operation_SIGQUIT_ETCD_AND_REMOVE_DATA #9548

gyuho commented Apr 9, 2018

functional-tester: clean up, handle Operation_SIGQUIT_ETCD_AND_REMOVE_DATA #9548

functional-tester: clean up, handle Operation_SIGQUIT_ETCD_AND_REMOVE_DATA #9548

Conversation

gyuho commented Apr 9, 2018