You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 10, 2023. It is now read-only.
Deployed a dev workload cluster on vSphere - 1 control plane, 1 worker node.
Scaled worker nodes from 1 to 3. Success!
Scaled control plane nodes from 1 to 3. Failure!
What I observed was as follows:
Second control plane node is successfully cloned, powered on and receives an IP address successfully via DHCP.
Original control plane node seems to lose its network information (both VM IP and VIP for K8s API server) - observed in in vSphere client UI
K8s API server no longer reachable via kubectl commands
CPU Usage on original control plane node/VM triggers vSphere alarm (4.774GHz Used)
Switched to management cluster to look at some logs:
% kubectl logs capi-kubeadm-control-plane-controller-manager-5596569b-q6rxz -n capi-kubeadm-control-plane-system manager
I0705 08:24:34.446929 1 controller.go:355] controllers/KubeadmControlPlane "msg"="Scaling up control plane" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default" "Desired"=3 "Existing"=1
I0705 08:24:34.923501 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default"
I0705 08:24:35.396995 1 controller.go:355] controllers/KubeadmControlPlane "msg"="Scaling up control plane" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default" "Desired"=3 "Existing"=2
I0705 08:24:35.399412 1 scale.go:206] controllers/KubeadmControlPlane "msg"="Waiting for control plane to pass preflight checks" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default" "failures"="[machine workload-control-plane-gpgp6 does not have APIServerPodHealthy condition, machine workload-control-plane-gpgp6 does not have ControllerManagerPodHealthy condition, machine workload-control-plane-gpgp6 does not have SchedulerPodHealthy condition, machine workload-control-plane-gpgp6 does not have EtcdPodHealthy condition, machine workload-control-plane-gpgp6 does not have EtcdMemberHealthy condition]"
.
.
.
I0705 08:26:26.708159 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default"
I0705 08:27:27.066038 1 controller.go:355] controllers/KubeadmControlPlane "msg"="Scaling up control plane" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default" "Desired"=3 "Existing"=2
I0705 08:27:27.066330 1 scale.go:206] controllers/KubeadmControlPlane "msg"="Waiting for control plane to pass preflight checks" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default" "failures"="[machine workload-control-plane-kvj7j reports APIServerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine workload-control-plane-kvj7j reports ControllerManagerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine workload-control-plane-kvj7j reports SchedulerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine workload-control-plane-kvj7j reports EtcdPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine workload-control-plane-kvj7j reports EtcdMemberHealthy condition is unknown (Failed to get the node which is hosting the etcd member), machine workload-control-plane-gpgp6 reports APIServerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine workload-control-plane-gpgp6 reports ControllerManagerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine workload-control-plane-gpgp6 reports SchedulerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine workload-control-plane-gpgp6 reports EtcdPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine workload-control-plane-gpgp6 reports EtcdMemberHealthy condition is unknown (Failed to get the node which is hosting the etcd member)]"
I0705 08:27:42.486017 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="vcsa06-octoc" "kubeadmControlPlane"="vcsa06-octoc-control-plane" "namespace"="tkg-system"
I0705 08:27:57.078706 1 controller.go:182] controllers/KubeadmControlPlane "msg"="Could not connect to workload cluster to fetch status" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default" "err"="failed to create remote cluster client: default/workload: Get https://10.27.51.243:6443/api?timeout=30s: context deadline exceeded"
I0705 08:27:57.117486 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default"
I0705 08:28:57.168628 1 controller.go:182] controllers/KubeadmControlPlane "msg"="Could not connect to workload cluster to fetch status" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default" "err"="failed to create remote cluster client: default/workload: Get https://10.27.51.243:6443/api?timeout=30s: context deadline exceeded"
E0705 08:28:57.187424 1 controller.go:257] controller-runtime/controller "msg"="Reconciler error" "error"="cannot get remote client to workload cluster: default/workload: Get https://10.27.51.243:6443/api?timeout=30s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)" "controller"="kubeadmcontrolplane" "name"="workload-control-plane" "namespace"="default"
I0705 08:28:57.188000 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default"
I0705 08:29:57.225857 1 controller.go:182] controllers/KubeadmControlPlane "msg"="Could not connect to workload cluster to fetch status" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default" "err"="failed to create remote cluster client: default/workload: Get https://10.27.51.243:6443/api?timeout=30s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
E0705 08:29:57.227366 1 controller.go:257] controller-runtime/controller "msg"="Reconciler error" "error"="cannot get remote client to workload cluster: default/workload: Get https://10.27.51.243:6443/api?timeout=30s: context deadline exceeded" "controller"="kubeadmcontrolplane" "name"="workload-control-plane" "namespace"="default"
I0705 08:29:57.227913 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default"
I0705 08:30:52.225222 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="vcsa06-octoc" "kubeadmControlPlane"="vcsa06-octoc-control-plane" "namespace"="tkg-system"
I0705 08:30:57.267482 1 controller.go:182] controllers/KubeadmControlPlane "msg"="Could not connect to workload cluster to fetch status" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default" "err"="failed to create remote cluster client: default/workload: Get https://10.27.51.243:6443/api?timeout=30s: context deadline exceeded"
E0705 08:30:57.268704 1 controller.go:257] controller-runtime/controller "msg"="Reconciler error" "error"="cannot get remote client to workload cluster: default/workload: Get https://10.27.51.243:6443/api?timeout=30s: net/http: request canceled (Client.Timeout exceeded while awaiting headers)" "controller"="kubeadmcontrolplane" "name"="workload-control-plane" "namespace"="default"
I0705 08:30:57.269114 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default"
% kubectl logs capi-kubeadm-control-plane-controller-manager-5596569b-q6rxz -n capi-kubeadm-control-plane-system manager
I0705 08:21:35.975933 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="vcsa06-octoc" "kubeadmControlPlane"="vcsa06-octoc-control-plane" "namespace"="tkg-system"
I0705 08:24:33.802071 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default"
I0705 08:24:34.446929 1 controller.go:355] controllers/KubeadmControlPlane "msg"="Scaling up control plane" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default" "Desired"=3 "Existing"=1
I0705 08:24:34.923501 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default"
I0705 08:24:35.396995 1 controller.go:355] controllers/KubeadmControlPlane "msg"="Scaling up control plane" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default" "Desired"=3 "Existing"=2
I0705 08:24:35.399412 1 scale.go:206] controllers/KubeadmControlPlane "msg"="Waiting for control plane to pass preflight checks" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default" "failures"="[machine workload-control-plane-gpgp6 does not have APIServerPodHealthy condition, machine workload-control-plane-gpgp6 does not have ControllerManagerPodHealthy condition, machine workload-control-plane-gpgp6 does not have SchedulerPodHealthy condition, machine workload-control-plane-gpgp6 does not have EtcdPodHealthy condition, machine workload-control-plane-gpgp6 does not have EtcdMemberHealthy condition]"
.
.
.
I0705 08:26:26.708159 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default"
I0705 08:27:27.066038 1 controller.go:355] controllers/KubeadmControlPlane "msg"="Scaling up control plane" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default" "Desired"=3 "Existing"=2
I0705 08:27:27.066330 1 scale.go:206] controllers/KubeadmControlPlane "msg"="Waiting for control plane to pass preflight checks" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default" "failures"="[machine workload-control-plane-kvj7j reports APIServerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine workload-control-plane-kvj7j reports ControllerManagerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine workload-control-plane-kvj7j reports SchedulerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine workload-control-plane-kvj7j reports EtcdPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine workload-control-plane-kvj7j reports EtcdMemberHealthy condition is unknown (Failed to get the node which is hosting the etcd member), machine workload-control-plane-gpgp6 reports APIServerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine workload-control-plane-gpgp6 reports ControllerManagerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine workload-control-plane-gpgp6 reports SchedulerPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine workload-control-plane-gpgp6 reports EtcdPodHealthy condition is unknown (Failed to get the node which is hosting this component), machine workload-control-plane-gpgp6 reports EtcdMemberHealthy condition is unknown (Failed to get the node which is hosting the etcd member)]"
I0705 08:27:42.486017 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="vcsa06-octoc" "kubeadmControlPlane"="vcsa06-octoc-control-plane" "namespace"="tkg-system"
I0705 08:27:57.078706 1 controller.go:182] controllers/KubeadmControlPlane "msg"="Could not connect to workload cluster to fetch status" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default" "err"="failed to create remote cluster client: default/workload: Get https://10.27.51.243:6443/api?timeout=30s: context deadline exceeded"
I0705 08:27:57.117486 1 controller.go:244] controllers/KubeadmControlPlane "msg"="Reconcile KubeadmControlPlane" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default"
I0705 08:28:57.168628 1 controller.go:182] controllers/KubeadmControlPlane "msg"="Could not connect to workload cluster to fetch status" "cluster"="workload" "kubeadmControlPlane"="workload-control-plane" "namespace"="default" "err"="failed to create remote cluster client: default/workload: Get https://10.27.51.243:6443/api?timeout=30s: context deadline exceeded"
E0705 08:28:57.187424 1 controller.go:257] controller-runtime/controller "msg"="Reconciler error" "error"="cannot get remote client to workload cluster: default/workload: Get https://10.27.51.243:6443/api?timeout=30s: context deadline exceeded (Client.Timeout exceeded while awaiting headers)" "controller"="kubeadmcontrolplane" "name"="workload-control-plane" "namespace"="default"
To try and regain access to the cluster, I reset (via the vSphere client) the original control plane node/VM. This allowed the node to regain its networking configuration and I could once again access the API server after it rebooted and I waited a few minutes.
However the control plane is still not reconciled:
Seems the CSI issue - Invalid attach limit value 0 cannot be added to CSINode object for "csi.vsphere.vmware.com" - is not related at this occurs on all nodes, even on a fresh deployment.
Seems that this is an issue that also impacts initial deployment of control planes as well. If I deploy a "dev" control plane with a single node, it comes up immediately. If I deploy a "prod" control plane, it seems to encounter the same issue as scaling.
jpmcb
transferred this issue from vmware-tanzu/community-edition
Oct 11, 2021
Bug Report
Deployed a dev workload cluster on vSphere - 1 control plane, 1 worker node.
Scaled worker nodes from 1 to 3. Success!
Scaled control plane nodes from 1 to 3. Failure!
What I observed was as follows:
Switched to management cluster to look at some logs:
To try and regain access to the cluster, I reset (via the vSphere client) the original control plane node/VM. This allowed the node to regain its networking configuration and I could once again access the API server after it rebooted and I waited a few minutes.
However the control plane is still not reconciled:
And the kubelet status on the new node seems to have an issue with the CSI driver, but I cannot tell if this is the root cause:
I have managed to repeat this scenario twice with 2 different TKG clusters on vSphere.
Expected Behavior
That the control plane would scale seamlessly.
Steps to Reproduce the Bug
Environment Details
v0.5.0
macOS Big Sur version 11.4
The text was updated successfully, but these errors were encountered: