Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Feature] Require ResignLeadership during upgrade #1734

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
- (Feature) (Scheduler) Additional types
- (Feature) Alternative Upgrade Order Feature
- (Feature) (Scheduler) SchedV1 Integration
- (Feature) Require ResignLeadership during upgrade

## [1.2.42](https://github.com/arangodb/kube-arangodb/tree/1.2.42) (2024-07-23)
- (Maintenance) Go 1.22.4 & Kubernetes 1.29.6 libraries
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ Flags:
--deployment.feature.backup-cleanup Cleanup imported backups if required - Required ArangoDB 3.8.0 or higher
--deployment.feature.deployment-spec-defaults-restore Restore defaults from last accepted state of deployment - Required ArangoDB 3.8.0 or higher (default true)
--deployment.feature.enforced-resign-leadership Enforce ResignLeadership and ensure that Leaders are moved from restarted DBServer - Required ArangoDB 3.8.0 or higher (default true)
--deployment.feature.ensure-secured-resign-leadership Ensures that even if ResignLeadership job timeouted, data is still replicated on other servers - Required ArangoDB 3.8.0 or higher (default true)
--deployment.feature.ephemeral-volumes Enables ephemeral volumes for apps and tmp directory - Required ArangoDB 3.8.0 or higher
--deployment.feature.failover-leadership Support for leadership in fail-over mode - Required ArangoDB 3.8.0 or higher
--deployment.feature.init-containers-copy-resources Copy resources spec to built-in init containers if they are not specified - Required ArangoDB 3.8.0 or higher (default true)
Expand Down
1 change: 1 addition & 0 deletions docs/cli/arangodb_operator.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ Flags:
--deployment.feature.backup-cleanup Cleanup imported backups if required - Required ArangoDB 3.8.0 or higher
--deployment.feature.deployment-spec-defaults-restore Restore defaults from last accepted state of deployment - Required ArangoDB 3.8.0 or higher (default true)
--deployment.feature.enforced-resign-leadership Enforce ResignLeadership and ensure that Leaders are moved from restarted DBServer - Required ArangoDB 3.8.0 or higher (default true)
--deployment.feature.ensure-secured-resign-leadership Ensures that even if ResignLeadership job timeouted, data is still replicated on other servers - Required ArangoDB 3.8.0 or higher (default true)
--deployment.feature.ephemeral-volumes Enables ephemeral volumes for apps and tmp directory - Required ArangoDB 3.8.0 or higher
--deployment.feature.failover-leadership Support for leadership in fail-over mode - Required ArangoDB 3.8.0 or higher
--deployment.feature.init-containers-copy-resources Copy resources spec to built-in init containers if they are not specified - Required ArangoDB 3.8.0 or higher (default true)
Expand Down
2 changes: 2 additions & 0 deletions docs/generated/actions.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ nav_order: 11
| EncryptionKeyRemove | no | 10m0s | no | Enterprise Only | Remove the encryption key to the pool |
| EncryptionKeyStatusUpdate | no | 10m0s | no | Enterprise Only | Update status of encryption propagation |
| EnforceResignLeadership | no | 45m0s | yes | Community & Enterprise | Run the ResignLeadership job on DBServer and checks data compatibility after |
| EnsureSecuredResignLeadership | no | 10m0s | no | Community & Enterprise | Ensures that data is still replicated on other servers |
| Idle | no | 10m0s | no | Community & Enterprise | Define idle operation in case if preconditions are not meet |
| JWTAdd | no | 10m0s | no | Enterprise Only | Adds new JWT to the pool |
| JWTClean | no | 10m0s | no | Enterprise Only | Remove JWT key from the pool |
Expand Down Expand Up @@ -133,6 +134,7 @@ spec:
EncryptionKeyRemove: 10m0s
EncryptionKeyStatusUpdate: 10m0s
EnforceResignLeadership: 45m0s
EnsureSecuredResignLeadership: 10m0s
Idle: 10m0s
JWTAdd: 10m0s
JWTClean: 10m0s
Expand Down
2 changes: 1 addition & 1 deletion internal/actions.config.go.tmpl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{{- $root := . -}}
//
// Copyright 2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2023-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion internal/actions.go.tmpl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{{- $root := . -}}
//
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion internal/actions.register.go.tmpl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{{- $root := . -}}
//
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion internal/actions.register.test.go.tmpl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{{- $root := . -}}
//
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down
3 changes: 3 additions & 0 deletions internal/actions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ actions:
description: Run the ResignLeadership job on DBServer and checks data compatibility after
timeout: 45m
optional: true
EnsureSecuredResignLeadership:
description: Ensures that data is still replicated on other servers
timeout: 10m
KillMemberPod:
description: Execute Delete on Pod (put pod in Terminating state)
scopes:
Expand Down
14 changes: 13 additions & 1 deletion pkg/apis/deployment/v1/actions.generated.go
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
//
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -101,6 +101,9 @@ const (
// ActionEnforceResignLeadershipDefaultTimeout define default timeout for action ActionEnforceResignLeadership
ActionEnforceResignLeadershipDefaultTimeout time.Duration = 2700 * time.Second // 45m0s

// ActionEnsureSecuredResignLeadershipDefaultTimeout define default timeout for action ActionEnsureSecuredResignLeadership
ActionEnsureSecuredResignLeadershipDefaultTimeout time.Duration = 600 * time.Second // 10m0s

// ActionIdleDefaultTimeout define default timeout for action ActionIdle
ActionIdleDefaultTimeout time.Duration = ActionsDefaultTimeout

Expand Down Expand Up @@ -362,6 +365,9 @@ const (
// ActionTypeEnforceResignLeadership in scopes Normal. Run the ResignLeadership job on DBServer and checks data compatibility after
ActionTypeEnforceResignLeadership ActionType = "EnforceResignLeadership"

// ActionTypeEnsureSecuredResignLeadership in scopes Normal. Ensures that data is still replicated on other servers
ActionTypeEnsureSecuredResignLeadership ActionType = "EnsureSecuredResignLeadership"

// ActionTypeIdle in scopes Normal. Define idle operation in case if preconditions are not meet
ActionTypeIdle ActionType = "Idle"

Expand Down Expand Up @@ -601,6 +607,8 @@ func (a ActionType) DefaultTimeout() time.Duration {
return ActionEncryptionKeyStatusUpdateDefaultTimeout
case ActionTypeEnforceResignLeadership:
return ActionEnforceResignLeadershipDefaultTimeout
case ActionTypeEnsureSecuredResignLeadership:
return ActionEnsureSecuredResignLeadershipDefaultTimeout
case ActionTypeIdle:
return ActionIdleDefaultTimeout
case ActionTypeJWTAdd:
Expand Down Expand Up @@ -779,6 +787,8 @@ func (a ActionType) Priority() ActionPriority {
return ActionPriorityNormal
case ActionTypeEnforceResignLeadership:
return ActionPriorityNormal
case ActionTypeEnsureSecuredResignLeadership:
return ActionPriorityNormal
case ActionTypeIdle:
return ActionPriorityNormal
case ActionTypeJWTAdd:
Expand Down Expand Up @@ -969,6 +979,8 @@ func (a ActionType) Optional() bool {
return false
case ActionTypeEnforceResignLeadership:
return true
case ActionTypeEnsureSecuredResignLeadership:
return false
case ActionTypeIdle:
return false
case ActionTypeJWTAdd:
Expand Down
14 changes: 13 additions & 1 deletion pkg/apis/deployment/v2alpha1/actions.generated.go
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
//
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -101,6 +101,9 @@ const (
// ActionEnforceResignLeadershipDefaultTimeout define default timeout for action ActionEnforceResignLeadership
ActionEnforceResignLeadershipDefaultTimeout time.Duration = 2700 * time.Second // 45m0s

// ActionEnsureSecuredResignLeadershipDefaultTimeout define default timeout for action ActionEnsureSecuredResignLeadership
ActionEnsureSecuredResignLeadershipDefaultTimeout time.Duration = 600 * time.Second // 10m0s

// ActionIdleDefaultTimeout define default timeout for action ActionIdle
ActionIdleDefaultTimeout time.Duration = ActionsDefaultTimeout

Expand Down Expand Up @@ -362,6 +365,9 @@ const (
// ActionTypeEnforceResignLeadership in scopes Normal. Run the ResignLeadership job on DBServer and checks data compatibility after
ActionTypeEnforceResignLeadership ActionType = "EnforceResignLeadership"

// ActionTypeEnsureSecuredResignLeadership in scopes Normal. Ensures that data is still replicated on other servers
ActionTypeEnsureSecuredResignLeadership ActionType = "EnsureSecuredResignLeadership"

// ActionTypeIdle in scopes Normal. Define idle operation in case if preconditions are not meet
ActionTypeIdle ActionType = "Idle"

Expand Down Expand Up @@ -601,6 +607,8 @@ func (a ActionType) DefaultTimeout() time.Duration {
return ActionEncryptionKeyStatusUpdateDefaultTimeout
case ActionTypeEnforceResignLeadership:
return ActionEnforceResignLeadershipDefaultTimeout
case ActionTypeEnsureSecuredResignLeadership:
return ActionEnsureSecuredResignLeadershipDefaultTimeout
case ActionTypeIdle:
return ActionIdleDefaultTimeout
case ActionTypeJWTAdd:
Expand Down Expand Up @@ -779,6 +787,8 @@ func (a ActionType) Priority() ActionPriority {
return ActionPriorityNormal
case ActionTypeEnforceResignLeadership:
return ActionPriorityNormal
case ActionTypeEnsureSecuredResignLeadership:
return ActionPriorityNormal
case ActionTypeIdle:
return ActionPriorityNormal
case ActionTypeJWTAdd:
Expand Down Expand Up @@ -969,6 +979,8 @@ func (a ActionType) Optional() bool {
return false
case ActionTypeEnforceResignLeadership:
return true
case ActionTypeEnsureSecuredResignLeadership:
return false
case ActionTypeIdle:
return false
case ActionTypeJWTAdd:
Expand Down
44 changes: 43 additions & 1 deletion pkg/deployment/agency/state/state.go
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
//
// DISCLAIMER
//
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -262,6 +262,48 @@ func (s State) PlanLeaderServersWithFailOver() Servers {
return r
}

// IsServerWithShardBackup returns true if server can be restarted with risk
func (s State) IsServerWithShardBackup(server Server) bool {
for db, dbData := range s.Plan.Collections {
for collection, collectionData := range dbData {
for shard, shardDetails := range collectionData.Shards {
if len(shardDetails) <= 1 {
// RF is 1, nothing to do
continue
}

// Fund current state
currentDBs, ok := s.Current.Collections[db]
if !ok {
continue
}

currentCollection, ok := currentDBs[collection]
if !ok {
continue
}

currentShard, ok := currentCollection[shard]
if !ok {
continue
}

if len(currentShard.Servers) == 0 {
continue
}

if currentShard.Servers[0] == server {
if len(currentShard.Servers) == 1 {
return false
}
}
}
}
}

return true
}

type CollectionShardDetails []CollectionShardDetail

type CollectionShardDetail struct {
Expand Down
54 changes: 53 additions & 1 deletion pkg/deployment/agency/state/state_test.go
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
//
// DISCLAIMER
//
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -307,6 +307,58 @@ func Test_IsDBServerReadyToRestart(t *testing.T) {
}
}

func Test_IsServerWithShardBackup(t *testing.T) {
type testCase struct {
generator Generator
ready bool
server Server
}
newDBWithCol := func(writeConcern int) CollectionGeneratorInterface {
return NewDatabaseRandomGenerator().RandomCollection().WithWriteConcern(writeConcern)
}
tcs := map[string]testCase{
"missing replica": {
generator: newDBWithCol(1).WithShard().WithPlan("A", "B").WithCurrent("A").Add().Add().Add(),
ready: false,
server: "A",
},
"ready replica": {
generator: newDBWithCol(1).WithShard().WithPlan("A", "B").WithCurrent("A", "B").Add().Add().Add(),
ready: true,
server: "A",
},
"not affected replica": {
generator: newDBWithCol(1).WithShard().WithPlan("A", "B").WithCurrent("A").Add().Add().Add(),
ready: true,
server: "B",
},
"not affected nonexisting replica": {
generator: newDBWithCol(1).WithShard().WithPlan("A", "B").WithCurrent("A").Add().Add().Add(),
ready: true,
server: "C",
},
"rf1": {
generator: newDBWithCol(1).WithShard().WithPlan("A").WithCurrent("A").Add().Add().Add(),
ready: true,
server: "A",
},
}

for name, tc := range tcs {
t.Run(name, func(t *testing.T) {
s := GenerateState(t, tc.generator)

res := s.IsServerWithShardBackup(tc.server)

if tc.ready {
require.True(t, res)
} else {
require.False(t, res)
}
})
}
}

func Test_GetCollectionDatabaseByID(t *testing.T) {
var s DumpState
require.NoError(t, json.Unmarshal(agencyDump39, &s))
Expand Down
13 changes: 13 additions & 0 deletions pkg/deployment/features/resign_leadership.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ package features

func init() {
registerFeature(enforcedResignLeadership)
registerFeature(ensureSecuredResignLeadership)
}

var enforcedResignLeadership = &feature{
Expand All @@ -31,7 +32,19 @@ var enforcedResignLeadership = &feature{
enabledByDefault: true,
}

var ensureSecuredResignLeadership = &feature{
name: "ensure-secured-resign-leadership",
description: "Ensures that even if ResignLeadership job timeouted, data is still replicated on other servers",
enterpriseRequired: false,
enabledByDefault: true,
}

// EnforcedResignLeadership returns enforced ResignLeadership.
func EnforcedResignLeadership() Feature {
return enforcedResignLeadership
}

// EnsureSecuredResignLeadership returns information if data is saved on other DBServers.
func EnsureSecuredResignLeadership() Feature {
return ensureSecuredResignLeadership
}
2 changes: 1 addition & 1 deletion pkg/deployment/reconcile/action.config.generated.go
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
//
// Copyright 2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2023-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down
19 changes: 18 additions & 1 deletion pkg/deployment/reconcile/action.register.generated.go
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
//
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -96,6 +96,9 @@ var (
_ Action = &actionEnforceResignLeadership{}
_ actionFactory = newEnforceResignLeadershipAction

_ Action = &actionEnsureSecuredResignLeadership{}
_ actionFactory = newEnsureSecuredResignLeadershipAction

_ Action = &actionIdle{}
_ actionFactory = newIdleAction

Expand Down Expand Up @@ -619,6 +622,20 @@ func init() {
registerAction(action, function)
}

// EnsureSecuredResignLeadership
{
// Get Action type
action := api.ActionTypeEnsureSecuredResignLeadership

// Get Action defition
function := newEnsureSecuredResignLeadershipAction

// Wrap action main function

// Register action
registerAction(action, function)
}

// Idle
{
// Get Action type
Expand Down
Loading