Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calico-kube-controllers not sync k8s node, when pod status pending and no pod ip because node deleted #9618

Open
hangzhouwanjun opened this issue Dec 19, 2024 · 2 comments

Comments

@hangzhouwanjun
Copy link

Datastore is etcd and calico is deployed on kubernetes

There are node1、node2、node3,after delete node2
calico-kube-controllers not sync k8s node

calico-kube-controllers recognized that the node has been removed.
Cleaning up IPAM resources for deleted node node="node-223-174-vip-176
but node information still in etcd
/calico/resources/v3/projectcalico.org/nodes/node-223-174-vip-176

k8s node not found, because node-223-174-vip-176 deleted

NAME                   STATUS     ROLES                  AGE   VERSION
node-223-163-vip-176   Ready      <none>                 46h   v1.21.14
node-223-173-vip-176   Ready      control-plane,master   46h   v1.21.14
node-223-175-vip-176   NotReady   control-plane,master   46h   v1.21.14 

pod status
service-software seasqlcache-cluster-1 0/1 Pending

set calico-kube-controllers LOG_LEVEL debug, found such code bug, if pod status pending no pod ip, calico node information still in etcd even k8s node delete, forever not delete node information in etcd
kube-controllers/pkg/controllers/node/ipam.go

                       if c.allocationIsValid(a, true) {
				// Allocation is still valid. We can't cleanup the node yet, even
				// if it appears to be deleted, because the allocation's validity breaks
				// our confidence.
				canDelete = false
				a.markValid()
				continue
			}
	if p.Status.PodIP == "" || len(p.Status.PodIPs) == 0 {
		// The pod hasn't received an IP yet.
		log.Debugf("Pod IP has not yet been reported, consider allocation valid")
		return true
	}
                      if !kubernetesNodeExists {
			if !canDelete {
				// There are still valid allocations on the node.
				logc.Infof("Can't cleanup node yet - IPs still in use on this node")
				continue
			}

Debug log such as:

Failed to release block affinities for node calicoNode="node-223-174-vip-176" error=block '177.177.73.64/26' is not empty
Error cleaning up node error=block '177.177.73.64/26' is not empty node="node-223-174-vip-176"
Periodic IPAM sync failed error=block '177.177.73.64/26' is not empty 
Checking cache for pod handle="k8s-pod-network.5b1d21c9a440ab14388386a11d41a848e231c9fa01216c87f1de5885d424b1fc" ip="177.177.73.70" node="node-223-174-vip-176" pod="service-software/seasqlcache-slave-0-0" 
Pod IP has not yet been reported, consider allocation valid

calico block '177.177.73.64/26' in etcd, but k8s node-223-174-vip-176 deleted

"attributes": [
    {
      "handle_id": "k8s-pod-network.5b1d21c9a440ab14388386a11d41a848e231c9fa01216c87f1de5885d424b1fc",
      "secondary": {
        "namespace": "service-software",
        "node": "node-223-174-vip-176",
        "pod": "seasqlcache-slave-0-0",
        "timestamp": "2024-12-17 13:05:15.100894402 +0000 UTC"
      }
    },
    {
      "handle_id": "k8s-pod-network.6fb2121e0fb0a7775f5b818ff2e6b2497cd49cc957d46c785397ac64f19e9111",
      "secondary": {
        "namespace": "service-software",
        "node": "node-223-174-vip-176",
        "pod": "seaio-1",
        "timestamp": "2024-12-17 13:10:15.319910662 +0000 UTC"
      }
    },
    {
      "handle_id": "k8s-pod-network.af7622acf845da0b3c7e0f433b3be2713d6c848633cabf3734261274772193df",
      "secondary": {
        "namespace": "service-software",
        "node": "node-223-174-vip-176",
        "pod": "seamq-base-controller-2",
        "timestamp": "2024-12-17 13:11:08.64976728 +0000 UTC"
      }
    },
    {
      "handle_id": "k8s-pod-network.da9bbf946a481f28501c54b610827471233fdc96043223620d2ac389b2645e2c",
      "secondary": {
        "namespace": "service-software",
        "node": "node-223-174-vip-176",
        "pod": "seasqlcache-cluster-1",
        "timestamp": "2024-12-17 13:13:52.446225564 +0000 UTC"
      }
    }
  ] 

Expected Behavior

delete k8s node1
calico-kube-controllers delete etcd node information

Current Behavior

delete k8s node1
calico-kube-controllers not delete etcd node information

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

  • Calico version 3.23.5
  • Calico dataplane (iptables, windows etc.)
  • Orchestrator version (e.g. kubernetes, mesos, rkt): 1.21.14
  • Operating System and version:
  • Link to your project (optional):
@caseydavenport
Copy link
Member

@hangzhouwanjun I think this means you just need to drain your Kubernetes nodes of pods before removing them?

This comment explains why we need this check:

				// Allocation is still valid. We can't cleanup the node yet, even
				// if it appears to be deleted, because the allocation's validity breaks
				// our confidence.

I think the problem is that you have a pod attemtping to be installed on a node that doesn't exist, if I understand correctly. Deleting that pod should fix the problem.

@hangzhouwanjun
Copy link
Author

Thanks @caseydavenport
That pod bind that deleted node, and pod parent kind is statefulset,even if delete pod,that pod name not change。 So that pod status forever Pending,and calico-kube-controllers forever remove node information in etcd。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants