-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Voq chassis orchagent crash with 34K routes #3329
base: master
Are you sure you want to change the base?
Fix Voq chassis orchagent crash with 34K routes #3329
Conversation
0dd3c64
to
db2ee71
Compare
3fc02c7
to
23dbe33
Compare
1a47e76
to
a244a11
Compare
…nd voq sonic-mgmt suites Signed-off-by: saksarav <[email protected]>
441fdb5
to
1f88894
Compare
This PR fixes the issue sonic-net/sonic-buildimage#20507 which is introduced by #3269 |
1f88894
to
a5d3baa
Compare
@arlakshm for your viz |
Signed-off-by: saksarav <[email protected]>
remote_neigh = nkey | ||
break | ||
|
||
assert remote_neigh != "", "Remote neigh not found in ASIC_DB" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the remote neigh entry will be present. Should we check if the remote neighbor entry is still old neighbor? also should we add a check to see if the nexthop is not updated because of addneighbor?
remote_neigh = nkey | ||
break | ||
|
||
assert remote_neigh != "", "Remote neigh not found in ASIC_DB" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a check to see we have new nexthop for new neighbor?
066ab71
to
f12e81d
Compare
What I did
Don't add the remote system neighbor if the same neighbor exists.
Why I did it
The IMM has two asics and has 2 port channels in each asic and 2 port members in each port channel.
The ip address is configured on each port channel and bgp is enabled. The neighbor and routes are learned on these port channel.
In sonic-mgmt pc suite, the test case po-update removes the port members from one of the port channel, removes the ip address configured on that port channel, creates new port channel, adds the same port members to the new port channel, adds the same ip address to the new port channel.
In the remote asic, before all the routes learned on the old port channel are removed by routeOrch, orchagent trries to remove the neighbor and nexthop for the old portchannel. But since the routes are pending, the old nexthop and neighbor are not removed. Then the neighbor and nexthop for the new port channel are being added. If the neighbor is learned on remote system port in remote asic, the nexthop is added with alias as inband port's alias, so the key (ip,alias) is same for both old nexthop and new nexthop. When the new nexthop is added , it calls hasNextHop function to check if the nexthop with (ip-address, alias) as key and since the old nexthop is not removed yet, the hasNextHop returns true, however the assert(!hasNextHop) does n't trigger the crash. So addNextHop function replace the old nexthop with old rif-id with new nexthop with new old rif-id in the nexthop map. Then after all the routes learned on old port channel is removed, the old neighbor and old nexthop are being removed. Sine the old nexthop was replaced with new nexthop, when orchagent tries to delete the old nexthop, it actually deletes the new nexthop from SAI. Then when it tries to remove the old neighbor, SAI returns error since orchagent removed the new nexthop from SAI instead of old nexthop and old neighbor is still referenced by the old nexthop in SAI. So orchagent crashes when SAI returns error.
How I verified it
Ran pc and voq suite and verified it passes.
Details if related