Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BPF] Cannot establish connection to forwarded backend node via external IP #9675

Open
happytreees opened this issue Jan 3, 2025 · 0 comments

Comments

@happytreees
Copy link

happytreees commented Jan 3, 2025

Expected Behavior

When connecting to a node via external IP and the connection is forwarded to another node the connection should use the external IP of the new node.

Current Behavior

When attempting to connect to a service of type NodePort using the External IP of the node, if the workload endpoint is not currently on that node it will be forwarded using eBPF. Once forwarded the connection hangs in syn-sent due to packets with unknown source on the internal IP. Because the external connection is not able to communicate over the nodes private network it is not able to establish the connection.

Currently we are using eBPF and have DSR enabled. Even with DSR disabled and using tunnel instead the problem persists.

Steps to Reproduce (for bugs)

  1. Deploy Calico to a K8S cluster with both External and Internal IP's with eBPF enabled and DSR enabled.
  2. Deploy a Nodeport service with a single replica
kubectl create deploy --image=nginx:1.23 nginx --replicas=1
kubectl expose deploy nginx --type=NodePort --port=8080 --target-port=80
  1. Attempt to connect to the service using the External IP of a node that the workload is not hosted on.
curl -IL external-ip:nodeport

This request will hang if the nginx container is not running on the IP of the requested node.

Context

NAT Table

<external-ip-redacted> port 32069 proto 6 id 45 count 1 local 0
        45:0     10.244.177.142:5671

NAT table shows the correct backend for the frontend

Observed BPF forwarding requests which get hung on syn-sent

TCP <redacted-client-ip>:16348 -> <redacted-external-ip>:32069 -> 10.244.177.142:5671 external client, service forwarded to/from 10.52.96.9  Age: 31m17.180024127s Active ago 31m17.180024127s SYN-SENT

On the counters we can see the packets being dropped on the internal interface:

+----------+--------------------------------+----------+---------+-----+
| CATEGORY |              TYPE              | INGRESS  | EGRESS  | XDP |
+----------+--------------------------------+----------+---------+-----+
| Accepted | by another program             |        0 |  288787 | N/A |
|          | by failsafe                    |        1 |     357 | N/A |
|          | by policy                      |        0 |       0 | N/A |
| Dropped  | by policy                      |        0 |       0 | N/A |
|          | failed decapsulation           |        0 |       0 | N/A |
|          | failed encapsulation           |        0 |       0 | N/A |
|          | incorrect checksum             |        0 |       0 | N/A |
|          | malformed IP packets           |        0 |       0 | N/A |
|          | packets hitting blackhole      |        0 |       0 | N/A |
|          | route                          |          |         |     |
|          | packets with unknown route     |        0 |       0 | N/A |
|          | packets with unknown source    |   189050 |       0 | N/A |
|          | packets with unsupported IP    |        0 |       0 | N/A |
|          | options                        |          |         |     |
|          | too short packets              |        0 |       0 | N/A |
| Total    | packets                        | 96113304 | 4199273 | N/A |
+----------+--------------------------------+----------+---------+-----+

tc output dropped:

qdisc clsact ffff: parent ffff:fff1 
 Sent 27385993003 bytes 177099157 pkt (dropped 189065, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0

External IP connection:

nc -zvw5 <external-ip-redacted> 32069
nc: connectx to <external-ip-redacted> port 32069 (tcp) failed: Operation timed out

Internal IP connection:

nc -zvw5 10.52.96.5 32069
Connection to 10.52.96.5 32069 port [tcp/*] succeeded!

Your Environment

  • Calico version: 3.29.0
  • Calico dataplane: eBPF
  • Orchestrator version (e.g. kubernetes, mesos, rkt): Kubernetes
  • Operating System and version: Ubuntu 22.04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant