Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[fix] handle requeues for linode api errors on update and delete, set instance ID if linode already exists #408

Merged
merged 3 commits into from
Jul 16, 2024

Conversation

AshleyDumaine
Copy link
Contributor

@AshleyDumaine AshleyDumaine commented Jul 10, 2024

What this PR does / why we need it: While @rahulait was performing scale testing, errors were seen on update and delete erroring out. e.g.:

2024-06-29T00:00:07Z	ERROR	LinodeMachineReconciler	Failed to get Linode machine instance	{"controller": "linodemachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "LinodeMachine", "LinodeMachine": {"name":"kubeadm4-md-0-65r6c-s8lqh","namespace":"default"}, "namespace": "default", "name": "kubeadm4-md-0-65r6c-s8lqh", "reconcileID": "b2070ab5-a216-4365-87f1-bfa2a2ee1076", "name": "default/kubeadm4-md-0-65r6c-s8lqh", "LinodeMachine": "kubeadm4-md-0-65r6c-s8lqh", "ID": 60786582, "error": "[502] Internal server error"}
2024-06-29T00:00:07Z	DEBUG	events	[502] Internal server error	{"type": "Warning", "object": {"kind":"LinodeMachine","namespace":"default","name":"kubeadm4-md-0-65r6c-s8lqh","uid":"7f71aad9-9775-48ea-bb4f-a93e09560e79","apiVersion":"infrastructure.cluster.x-k8s.io/v1alpha1","resourceVersion":"1323350"}, "reason": "UpdateError"}
2024-06-29T00:00:07Z	ERROR	Reconciler error	{"controller": "linodemachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "LinodeMachine", "LinodeMachine": {"name":"kubeadm4-md-0-65r6c-s8lqh","namespace":"default"}, "namespace": "default", "name": "kubeadm4-md-0-65r6c-s8lqh", "reconcileID": "b2070ab5-a216-4365-87f1-bfa2a2ee1076", "error": "[502] Internal server error"}

There was also an error encountered during Linode instance creation where if the instance already existed the ID was not getting set on the LinodeMachine spec which caused reconcile errors for create and the subsequent delete on cleanup:

2024-07-01T22:05:34Z	INFO	LinodeMachineReconciler	creating machine	{"controller": "linodemachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "LinodeMachine", "LinodeMachine": {"name":"kubeadm4-md-0-k9jk5-fd6dr","namespace":"default"}, "namespace": "default", "name": "kubeadm4-md-0-k9jk5-fd6dr", "reconcileID": "eebbb296-243b-4d8c-a4cf-f7bae3559876", "name": "default/kubeadm4-md-0-k9jk5-fd6dr", "LinodeMachine": "kubeadm4-md-0-k9jk5-fd6dr"}
2024-07-01T22:05:34Z	INFO	LinodeMachineReconciler	Linode instance already exists	{"controller": "linodemachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "LinodeMachine", "LinodeMachine": {"name":"kubeadm4-md-0-k9jk5-fd6dr","namespace":"default"}, "namespace": "default", "name": "kubeadm4-md-0-k9jk5-fd6dr", "reconcileID": "eebbb296-243b-4d8c-a4cf-f7bae3559876", "name": "default/kubeadm4-md-0-k9jk5-fd6dr", "LinodeMachine": "kubeadm4-md-0-k9jk5-fd6dr"}
2024-07-01T22:12:49Z	INFO	LinodeMachineReconciler	updating machine	{"controller": "linodemachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "LinodeMachine", "LinodeMachine": {"name":"kubeadm4-md-0-k9jk5-fd6dr","namespace":"default"}, "namespace": "default", "name": "kubeadm4-md-0-k9jk5-fd6dr", "reconcileID": "43596eeb-9b33-4b00-987d-823e79385e1c", "name": "default/kubeadm4-md-0-k9jk5-fd6dr", "LinodeMachine": "kubeadm4-md-0-k9jk5-fd6dr"}
2024-07-01T22:12:49Z	DEBUG	events	missing instance ID	{"type": "Warning", "object": {"kind":"LinodeMachine","namespace":"default","name":"kubeadm4-md-0-k9jk5-fd6dr","uid":"5a556379-5957-4258-ae2f-227b575830ef","apiVersion":"infrastructure.cluster.x-k8s.io/v1alpha1","resourceVersion":"28601"}, "reason": "UpdateError"}

2024-07-01T22:12:49Z	ERROR	Reconciler error	{"controller": "linodemachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "LinodeMachine", "LinodeMachine": {"name":"kubeadm4-md-0-k9jk5-fd6dr","namespace":"default"}, "namespace": "default", "name": "kubeadm4-md-0-k9jk5-fd6dr", "reconcileID": "43596eeb-9b33-4b00-987d-823e79385e1c", "error": "missing instance ID"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:261
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/internal/controller/controller.go:222
2024-07-02T00:40:34Z	INFO	LinodeMachineReconciler	deleting machine	{"controller": "linodemachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "LinodeMachine", "LinodeMachine": {"name":"kubeadm4-md-0-k9jk5-fd6dr","namespace":"default"}, "namespace": "default", "name": "kubeadm4-md-0-k9jk5-fd6dr", "reconcileID": "02641089-4faa-4953-b0a6-b27a29f1e449", "name": "default/kubeadm4-md-0-k9jk5-fd6dr", "LinodeMachine": "kubeadm4-md-0-k9jk5-fd6dr"}
2024-07-02T00:40:34Z	INFO	LinodeMachineReconciler	Machine ID is missing, nothing to do	{"controller": "linodemachine", "controllerGroup": "infrastructure.cluster.x-k8s.io", "controllerKind": "LinodeMachine", "LinodeMachine": {"name":"kubeadm4-md-0-k9jk5-fd6dr","namespace":"default"}, "namespace": "default", "name": "kubeadm4-md-0-k9jk5-fd6dr", "reconcileID": "02641089-4faa-4953-b0a6-b27a29f1e449", "name": "default/kubeadm4-md-0-k9jk5-fd6dr", "LinodeMachine": "kubeadm4-md-0-k9jk5-fd6dr"}

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • adds or updates e2e tests

Copy link

codecov bot commented Jul 10, 2024

Codecov Report

Attention: Patch coverage is 52.94118% with 8 lines in your changes missing coverage. Please review.

Project coverage is 67.50%. Comparing base (447354e) to head (5997efb).

Files Patch % Lines
controller/linodemachine_controller.go 52.94% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #408      +/-   ##
==========================================
+ Coverage   66.10%   67.50%   +1.39%     
==========================================
  Files          42       42              
  Lines        2623     2625       +2     
==========================================
+ Hits         1734     1772      +38     
+ Misses        794      746      -48     
- Partials       95      107      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AshleyDumaine AshleyDumaine changed the title [fix] handle requeues for linode api errors on update and delete [fix] handle requeues for linode api errors on update and delete, set instance ID if linode already exists Jul 10, 2024
@AshleyDumaine AshleyDumaine force-pushed the requeue-on-api-error branch 2 times, most recently from d999a15 to 2f35892 Compare July 11, 2024 13:46
@AshleyDumaine AshleyDumaine marked this pull request as ready for review July 15, 2024 13:25
@AshleyDumaine AshleyDumaine force-pushed the requeue-on-api-error branch from 43f4272 to 576cb65 Compare July 15, 2024 21:18
@AshleyDumaine AshleyDumaine force-pushed the requeue-on-api-error branch 2 times, most recently from e334d7e to 314c0e5 Compare July 15, 2024 21:23
eljohnson92
eljohnson92 previously approved these changes Jul 16, 2024
eljohnson92
eljohnson92 previously approved these changes Jul 16, 2024
@AshleyDumaine AshleyDumaine merged commit 703a6b6 into main Jul 16, 2024
11 of 12 checks passed
@AshleyDumaine AshleyDumaine deleted the requeue-on-api-error branch July 16, 2024 17:33
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants