Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

calico-kube-controllers not sync k8s node, when pod status pending and no pod ip because node deleted #9618

Open
hangzhouwanjun opened this issue Dec 19, 2024 · 0 comments

Comments

@hangzhouwanjun
Copy link

Datastore is etcd and calico is deployed on kubernetes

There are node1、node2、node3,after delete node2
calico-kube-controllers not sync k8s node

calico-kube-controllers recognized that the node has been removed.
Cleaning up IPAM resources for deleted node node="node-223-174-vip-176
but node information still in etcd
/calico/resources/v3/projectcalico.org/nodes/node-223-174-vip-176

k8s node not found, because node-223-174-vip-176 deleted

NAME                   STATUS     ROLES                  AGE   VERSION
node-223-163-vip-176   Ready      <none>                 46h   v1.21.14
node-223-173-vip-176   Ready      control-plane,master   46h   v1.21.14
node-223-175-vip-176   NotReady   control-plane,master   46h   v1.21.14 

pod status
service-software seasqlcache-cluster-1 0/1 Pending

set calico-kube-controllers LOG_LEVEL debug, found such code bug, if pod status pending no pod ip, calico node information still in etcd even k8s node delete, forever not delete node information in etcd
kube-controllers/pkg/controllers/node/ipam.go

                       if c.allocationIsValid(a, true) {
				// Allocation is still valid. We can't cleanup the node yet, even
				// if it appears to be deleted, because the allocation's validity breaks
				// our confidence.
				canDelete = false
				a.markValid()
				continue
			}
	if p.Status.PodIP == "" || len(p.Status.PodIPs) == 0 {
		// The pod hasn't received an IP yet.
		log.Debugf("Pod IP has not yet been reported, consider allocation valid")
		return true
	}
                      if !kubernetesNodeExists {
			if !canDelete {
				// There are still valid allocations on the node.
				logc.Infof("Can't cleanup node yet - IPs still in use on this node")
				continue
			}

Debug log such as:

Failed to release block affinities for node calicoNode="node-223-174-vip-176" error=block '177.177.73.64/26' is not empty
Error cleaning up node error=block '177.177.73.64/26' is not empty node="node-223-174-vip-176"
Periodic IPAM sync failed error=block '177.177.73.64/26' is not empty 
Checking cache for pod handle="k8s-pod-network.5b1d21c9a440ab14388386a11d41a848e231c9fa01216c87f1de5885d424b1fc" ip="177.177.73.70" node="node-223-174-vip-176" pod="service-software/seasqlcache-slave-0-0" 
Pod IP has not yet been reported, consider allocation valid

calico block '177.177.73.64/26' in etcd, but k8s node-223-174-vip-176 deleted

"attributes": [
    {
      "handle_id": "k8s-pod-network.5b1d21c9a440ab14388386a11d41a848e231c9fa01216c87f1de5885d424b1fc",
      "secondary": {
        "namespace": "service-software",
        "node": "node-223-174-vip-176",
        "pod": "seasqlcache-slave-0-0",
        "timestamp": "2024-12-17 13:05:15.100894402 +0000 UTC"
      }
    },
    {
      "handle_id": "k8s-pod-network.6fb2121e0fb0a7775f5b818ff2e6b2497cd49cc957d46c785397ac64f19e9111",
      "secondary": {
        "namespace": "service-software",
        "node": "node-223-174-vip-176",
        "pod": "seaio-1",
        "timestamp": "2024-12-17 13:10:15.319910662 +0000 UTC"
      }
    },
    {
      "handle_id": "k8s-pod-network.af7622acf845da0b3c7e0f433b3be2713d6c848633cabf3734261274772193df",
      "secondary": {
        "namespace": "service-software",
        "node": "node-223-174-vip-176",
        "pod": "seamq-base-controller-2",
        "timestamp": "2024-12-17 13:11:08.64976728 +0000 UTC"
      }
    },
    {
      "handle_id": "k8s-pod-network.da9bbf946a481f28501c54b610827471233fdc96043223620d2ac389b2645e2c",
      "secondary": {
        "namespace": "service-software",
        "node": "node-223-174-vip-176",
        "pod": "seasqlcache-cluster-1",
        "timestamp": "2024-12-17 13:13:52.446225564 +0000 UTC"
      }
    }
  ] 

Expected Behavior

delete k8s node1
calico-kube-controllers delete etcd node information

Current Behavior

delete k8s node1
calico-kube-controllers not delete etcd node information

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

  • Calico version 3.23.5
  • Calico dataplane (iptables, windows etc.)
  • Orchestrator version (e.g. kubernetes, mesos, rkt): 1.21.14
  • Operating System and version:
  • Link to your project (optional):
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant