Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Yurtctl: add precheck for reducing convert failure #675

Merged
merged 1 commit into from
Dec 28, 2021

Conversation

Peeknut
Copy link
Member

@Peeknut Peeknut commented Dec 13, 2021

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:
/kind bug
/kind documentation
/kind enhancement
/kind good-first-issue
/kind feature
/kind question
/kind design
/sig ai
/sig iot
/sig network
/sig storage
/sig storage

/kind feature

What this PR does / why we need it:

Before yurtctl convert performs conversion, we do preflightcheck to improve conversion efficiency.

Which issue(s) this PR fixes:

Fixes #619

Special notes for your reviewer:

/assign @adamzhoul @DrmagicE @rambohe-ch

Does this PR introduce a user-facing change?


other Note

@openyurt-bot
Copy link
Collaborator

@Peeknut: GitHub didn't allow me to assign the following users: your_reviewer.

Note that only openyurtio members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:
/kind bug
/kind documentation
/kind enhancement
/kind good-first-issue
/kind feature
/kind question
/kind design
/sig ai
/sig iot
/sig network
/sig storage
/sig storage

/kind feature

What this PR does / why we need it:

Before yurtctl convert performs conversion, we do preflightcheck to improve conversion efficiency.

Which issue(s) this PR fixes:

Fixes #619

Special notes for your reviewer:

/assign @adamzhoul @DrmagicE @rambohe-ch

Does this PR introduce a user-facing change?


other Note

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@Peeknut
Copy link
Member Author

Peeknut commented Dec 13, 2021

Test:

[root@master bin]# ./yurtctl convert -c master
[preflight] Running pre-flight checks
[runConvert] Label all nodes with edgeworker label, annotate all nodes with autonomy annotation
[runConvert] Deploying yurt-controller-manager
[runConvert] Running jobs for convert. Job running may take a long time, and job failure will not affect the execution of the next stage
[runConvert] Running disable-node-controller jobs to disable node-controller
	[INFO] servant job(yurtctl-disable-node-controller-master) has succeeded
[runConvert] Running node-servant-convert jobs to deploy the yurt-hub and reset the kubelet service on edge and cloud nodes
	[INFO] servant job(node-servant-convert-node2) has succeeded
	[INFO] servant job(node-servant-convert-node1) has succeeded
	[INFO] servant job(node-servant-convert-master) has succeeded
[runConvert] If any job fails, you can get job information through 'kubectl get jobs -n kube-system' to debug.
	Note that before the next conversion, please delete all related jobs so as not to affect the conversion.
[root@master bin]#


[root@master ~]# kubectl get pod -A
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
kube-system   coredns-54d67798b7-28r2s                   1/1     Running   0          4h9m
kube-system   coredns-54d67798b7-wlmh8                   1/1     Running   0          4h9m
kube-system   etcd-master                                1/1     Running   0          4h9m
kube-system   kube-apiserver-master                      1/1     Running   0          4h9m
kube-system   kube-controller-manager-master             1/1     Running   0          30m
kube-system   kube-flannel-ds-4fmxr                      1/1     Running   0          4h7m
kube-system   kube-flannel-ds-7r2fh                      1/1     Running   0          4h8m
kube-system   kube-flannel-ds-j2xs5                      1/1     Running   0          4h7m
kube-system   kube-proxy-6jh7s                           1/1     Running   0          4h9m
kube-system   kube-proxy-9zdnz                           1/1     Running   0          4h7m
kube-system   kube-proxy-trw2v                           1/1     Running   0          4h7m
kube-system   kube-scheduler-master                      1/1     Running   0          4h9m
kube-system   yurt-controller-manager-77b97fd47b-cg9cc   1/1     Running   0          30m
kube-system   yurt-hub-master                            1/1     Running   0          28m
kube-system   yurt-hub-node1                             1/1     Running   0          29m
kube-system   yurt-hub-node2                             1/1     Running   0          29m
[root@master ~]#

@Peeknut
Copy link
Member Author

Peeknut commented Dec 13, 2021

@adamzhoul
As mentioned in issue #619

if use job, can we get the job pod log and print?
and if the job failed, we can print the error and delete the job.

a failed job like convert always stops users from retrying again.

Since there are many logs in the pod, if multiple pods fail, too much information is displayed, so I think it is better not to print the logs of the failed pod.
However, I stated in the prompt message that users should find relevant information by themselves, and the relevant job needs to be deleted before the next conversion:
[runConvert] If any job fails, you can get job information through 'kubectl get jobs -n kube-system' to debug. Note that before the next conversion, please delete all related jobs so as not to affect the conversion. [root@master bin]#

@rambohe-ch
Copy link
Member

/assign @adamzhoul

@adamzhoul @Peeknut I think we should make sure that yurtctl convert will be successful if yurtctl precheck is passed.

@Peeknut Peeknut changed the title Yurtctl: add precheck for reducing convert failure [WIP]Yurtctl: add precheck for reducing convert failure Dec 14, 2021
@openyurt-bot openyurt-bot added the do-not-merge/work-in-progress do-not-merge/work-in-progress label Dec 14, 2021
@Peeknut Peeknut force-pushed the yurtctl-precheck branch 2 times, most recently from 52f07b9 to c31a712 Compare December 24, 2021 09:32
@Peeknut
Copy link
Member Author

Peeknut commented Dec 24, 2021

test case
case 1: run success


[root@master bin]# ./yurtctl convert -c master --node-servant-image=openyurt/node-servant:test
[preflight] Running pre-flight checks
[preflight] Running node-servant-preflight-convert jobs to check on edge and cloud nodes. Job running may take a long time, and job failure will affect the execution of the next stage
[runConvert] Label all nodes with edgeworker label, annotate all nodes with autonomy annotation
[runConvert] Deploying yurt-controller-manager
[runConvert] Running jobs for convert. Job running may take a long time, and job failure will not affect the execution of the next stage
[runConvert] Running disable-node-controller jobs to disable node-controller
	[INFO] servant job(yurtctl-disable-node-controller-master) has succeeded
[runConvert] Running node-servant-convert jobs to deploy the yurt-hub and reset the kubelet service on edge and cloud nodes
	[INFO] servant job(node-servant-convert-node1) has succeeded
	[INFO] servant job(node-servant-convert-node2) has succeeded
	[INFO] servant job(node-servant-convert-master) has succeeded
[runConvert] If any job fails, you can get job information through 'kubectl get jobs -n kube-system' to debug.
	Note that before the next conversion, please delete all related jobs so as not to affect the conversion.
[root@master bin]#


case 2: node status check

[root@master bin]# kubectl get nodes
NAME     STATUS     ROLES                  AGE   VERSION
master   Ready      control-plane,master   11d   v1.20.0
node1    NotReady   <none>                 11d   v1.20.0
node2    Ready      <none>                 11d   v1.20.0
[root@master bin]#
[root@master bin]# ./yurtctl convert -c master --node-servant-image=openyurt/node-servant:test
[preflight] Running pre-flight checks
E1224 17:03:15.612388   16927 convert.go:84] Fail to run pre-flight checks: [preflight] Some fatal errors occurred:
	[ERROR NodeReady]: the status of nodes: [node1] is not 'Ready'
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
[root@master bin]#

case 3: label check

[root@master bin]# kubectl label node node2 openyurt.io/is-edge-worker=true
node/node2 labeled
[root@master bin]# ./yurtctl convert -c master --node-servant-image=openyurt/node-servant:test
[preflight] Running pre-flight checks
E1224 17:04:37.693953   17441 convert.go:84] Fail to run pre-flight checks: [preflight] Some fatal errors occurred:
	[ERROR NodeEdgeWorkerLabel]: the nodes [node2] has already been labeled as a OpenYurt node
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
[root@master bin]#
[root@master bin]#
[root@master bin]# ./yurtctl convert -c master --node-servant-image=openyurt/node-servant:test --ignore-preflight-errors=NodeEdgeWorkerLabel
[preflight] Running pre-flight checks
	[WARNING NodeEdgeWorkerLabel]: the nodes [node2] has already been labeled as a OpenYurt node
[preflight] Running node-servant-preflight-convert jobs to check on edge and cloud nodes. Job running may take a long time, and job failure will affect the execution of the next stage
[runConvert] Label all nodes with edgeworker label, annotate all nodes with autonomy annotation
[runConvert] Deploying yurt-controller-manager
[runConvert] Running jobs for convert. Job running may take a long time, and job failure will not affect the execution of the next stage
[runConvert] Running disable-node-controller jobs to disable node-controller
	[INFO] servant job(yurtctl-disable-node-controller-master) has succeeded
[runConvert] Running node-servant-convert jobs to deploy the yurt-hub and reset the kubelet service on edge and cloud nodes
	[INFO] servant job(node-servant-convert-node2) has succeeded
	[INFO] servant job(node-servant-convert-node1) has succeeded
	[INFO] servant job(node-servant-convert-master) has succeeded
[runConvert] If any job fails, you can get job information through 'kubectl get jobs -n kube-system' to debug.
	Note that before the next conversion, please delete all related jobs so as not to affect the conversion.
[root@master bin]#

@Peeknut Peeknut changed the title [WIP]Yurtctl: add precheck for reducing convert failure Yurtctl: add precheck for reducing convert failure Dec 24, 2021
@openyurt-bot openyurt-bot removed the do-not-merge/work-in-progress do-not-merge/work-in-progress label Dec 24, 2021
@Peeknut
Copy link
Member Author

Peeknut commented Dec 24, 2021

@adamzhoul PTAL

@adamzhoul
Copy link
Member

sorry for the late reply.
too long didn't read each line very carefully
but focus on the core checker part, it looks alright.
so
/lgtm

@openyurt-bot openyurt-bot added lgtm lgtm and removed lgtm lgtm labels Dec 27, 2021
@Peeknut
Copy link
Member Author

Peeknut commented Dec 27, 2021

@adamzhoul I resolved some conflicts: the package name in pkg/yurtctl/util/kubernetes/util.go was updated, and the others were not updated. But the label lgtm is removed, please label again. Thanks a lot!:)

@adamzhoul
Copy link
Member

/lgtm

@openyurt-bot openyurt-bot added the lgtm lgtm label Dec 27, 2021
@rambohe-ch
Copy link
Member

/lgtm
/approve

@openyurt-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Peeknut, rambohe-ch

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openyurt-bot openyurt-bot added the approved approved label Dec 28, 2021
@openyurt-bot openyurt-bot merged commit 4455011 into openyurtio:master Dec 28, 2021
MrGirl pushed a commit to MrGirl/openyurt that referenced this pull request Mar 29, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request]add yurtctl precheck command for reducing yurtctl convert failure.
5 participants