You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When performing a switchover (say active to standby or viceversa), we are observing orchagent process going down and thus leaving mux status in inconsistent state.
Based on the observations from debug logs, we thought using bulker for programming the routes/neighbors (introduced by PR: #3148) is the problem and confirmed the same by running the tests after reverting the PR changes.
Steps to reproduce the issue:
Run any sonic-mgmt test (Ex: tests/dualtor_io/test_link_failure.py) performing switchover (say using toggle_all_simulator_ports_to_rand_selected_tor or similar fixture which performs switchover during test setup).
Describe the results you received:
Tests will fail with Failed to toggle all ports to <tor_device> from mux simulator as mux status will be left in inconsistent state.
def _toggle_all_simulator_ports_to_target_dut(target_dut_hostname, duthosts, mux_server_url, tbinfo):
"""Helper function to toggle all ports to active on the target DUT."""
...
if not is_toggle_done and \
not utilities.wait_until(120, 10, 0, _check_toggle_done, duthosts, target_dut_hostname, probe=True):
> pytest_assert(False, "Failed to toggle all ports to {} from mux simulator".format(target_dut_hostname))
E Failed: Failed to toggle all ports to ld301 from mux simulator```
Orchagent process in swss docker container will be down (can we verified with ps aux inside swss container)
Describe the results you expected:
Switchover should have completed without any failures.
Additional information you deem important:
Some of the debug logs captured during the switchover,
Based on the debug logs captured during multiple test runs we suspected usage of bulker entity is causing orchagent to go down for some reason. And tried running the tests by reverting PR #3148 :[muxorch] Using bulker to program routes/neighbors during switchover and tests are passing.
The text was updated successfully, but these errors were encountered:
Description
When performing a switchover (say active to standby or viceversa), we are observing orchagent process going down and thus leaving mux status in inconsistent state.
Based on the observations from debug logs, we thought using bulker for programming the routes/neighbors (introduced by PR: #3148) is the problem and confirmed the same by running the tests after reverting the PR changes.
Steps to reproduce the issue:
tests/dualtor_io/test_link_failure.py
) performing switchover (say usingtoggle_all_simulator_ports_to_rand_selected_tor
or similar fixture which performs switchover during test setup).Describe the results you received:
Failed to toggle all ports to <tor_device> from mux simulator
as mux status will be left in inconsistent state.ps aux
inside swss container)Describe the results you expected:
Switchover should have completed without any failures.
Additional information you deem important:
Some of the debug logs captured during the switchover,
Based on the debug logs captured during multiple test runs we suspected usage of
bulker
entity is causing orchagent to go down for some reason. And tried running the tests by reverting PR #3148 :[muxorch] Using bulker to program routes/neighbors during switchover
and tests are passing.The text was updated successfully, but these errors were encountered: