-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VXLAN not working when tunnel address is borrowed #6160
Comments
/kind bug |
friendly ping :) @caseydavenport |
Hey sorry for the delay, have been out of the office for a bit.
I think this is a bug - there's no reason at a networking level that the IP needs to be from within the block on that node. |
Same symptom being described here: #5595 |
Thank your reply! @caseydavenport Our users uses a cidr mask of 20 and a I look up the source code, I found the logic of assigning tunnel IP and assigning ip of pod is the same, I think some distinction should be made here. I have the following two suggestions for this problem:
|
There shouldn't be a reason that Calico can't use a borrowed IP for the tunnel address. There is likely another fix that needs to be made rather than limiting the tunnel address in the way you described. That wouldn't fix the ultimate problem of limiting the cluster size to 64 nodes (any nodes past the number of blocks in the cluster would result in non-functional nodes without a tunnel address) |
When there are not enough blocks, a 31-masked MicoBlock is assigned from another block, and the tunnel IP is split from that MicoBlock, because the problem only affects Pods using HostNetwork on the new node to communicate with non-HostNetwork Pods on other nodes, and does not affect other communication scenarios. We just need to solve the tunnel IP routing problem. Also add a route like "x.xx.xx.xx/31 via tunlIP dev ifcfg-tunl". More nodes can be supported with relatively small changes. |
Ah, yes this makes sense. We're not programming a return route that tells pods where the tunnel address is (normally that is handled by the route for the block itself). |
OK,Then I'll start modifying it according to this |
I'm facing exactly the same issue, ie, the network connectivity of pods using the host network on nodes to communicate with non host network pods on other nodes, and does not affect different communication scenarios" Link to details of one of the ipam blocks: https://gist.github.com/sedflix/95bc34ee4a4fcde98ae93993708c864e Setup:
Within 15 minutes, we added approximately 130 nodes while using Calico 3.20. Within 30 minutes we removed those 130 nodes. This was done twice. |
I think this is a candidate fix: #9662 |
Expected Behavior
the IP of the vxlan.calico should not be assigned from the blocks of other nodes.
Current Behavior
I understand that each node should have at least one block, but in the case of insufficient ip or a large number of nodes, it may not be possible to assign a full block to a newly joined node, which may result in the ip of the newly joined node's vxlan possibly assigning addresses from other nodes' blocks, which will result in failure to access the new node's vxlan IP and timeout for pod query dns on newly joined nodes.
Possible Solution
Steps to Reproduce (for bugs)
This means that there are at most four blocks, and now I have four nodes, everything is fine now.
dce-10-29-12-112
), Since there are no extra blocks, so the vxlanIP of a new node will be allocated from the block of a node172.29.0.62
is belong to block172.29.0.48/28
test-112
is the test pod created on the newly joined node(dce-10-29-12-112
)test-125
is the test pod created on the old node(dce-10-29-12-125
)Context
Your Environment
The text was updated successfully, but these errors were encountered: