You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we add a new node to the cluster, a calico-node starts running on this node. It then establishes the BGP connection and sends an Open Message to negotiate the capabilities. At this point, we want to enable Graceful Restart with the Flag: 0x80 (Preserve forwarding state). However, sometimes BIRD will send a message with Flag: 0x00, which leads to a failure in enabling the Graceful Restart capability.
Current Behavior
Approximately 20% of nodes send the Open Message with Graceful Restart capability Flag: 0x00.
In our cluster, there are several thousand nodes, so this is a serious problem.
Possible Solution
Restarting the calico-node Pod and reestablishing the BGP connection allows the Graceful Restart capability to be set.
Steps to Reproduce (for bugs)
Run the tcpdump -i bond0 -n -vv 'port 179' -w data.pcap command to capture packets from the interface.
Analyze these packets using Wireshark.
Context
Failed State:
Successful State:
Your Environment
Calico version: v3.27.2
Calico dataplane (iptables, windows etc.): iptables
Orchestrator version (e.g. kubernetes, mesos, rkt): kubernetes
Operating System and version: Linux 5.10.134-16.3.an8.x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered:
I'm not really sure what might trigger BIRD to send that flag, to be honest. Any thoughts?
We do have GR testing that I don't think has ever hit this - functionally graceful restart seems to work in our environments. I wonder if there is some quirk of your BGP environment / ToR that triggers this behavior.
Expected Behavior
When we add a new node to the cluster, a calico-node starts running on this node. It then establishes the BGP connection and sends an Open Message to negotiate the capabilities. At this point, we want to enable Graceful Restart with the
Flag: 0x80
(Preserve forwarding state). However, sometimes BIRD will send a message withFlag: 0x00
, which leads to a failure in enabling the Graceful Restart capability.Current Behavior
Approximately 20% of nodes send the Open Message with Graceful Restart capability
Flag: 0x00
.In our cluster, there are several thousand nodes, so this is a serious problem.
Possible Solution
Restarting the calico-node Pod and reestablishing the BGP connection allows the Graceful Restart capability to be set.
Steps to Reproduce (for bugs)
tcpdump -i bond0 -n -vv 'port 179' -w data.pcap
command to capture packets from the interface.Context
Failed State:
Successful State:
Your Environment
v3.27.2
iptables
kubernetes
Linux 5.10.134-16.3.an8.x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: