-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
yarpmanager gets (almost) stuck in case the connection to yarpserver is delayed #3104
Comments
I dug a bit into the code. When pressing the "refresh all" button the following code is executed yarp/src/yarpmanager/src-manager/mainwindow.cpp Lines 759 to 775 in e4762b0
which in turn calls yarp/src/yarpmanager/src-manager/applicationviewwidget.cpp Lines 1710 to 1717 in e4762b0
and then yarp/src/yarpmanager/src-manager/applicationviewwidget.cpp Lines 1353 to 1406 in e4762b0
The juicy bit is yarp/src/yarpmanager/src-manager/applicationviewwidget.cpp Lines 1398 to 1400 in e4762b0
that spawns a new thread yarp/src/yarpmanager/src-manager/safe_manager.cpp Lines 467 to 484 in e4762b0
After this, yarp/src/yarpmanager/src-manager/applicationviewwidget.cpp Lines 1355 to 1357 in e4762b0
will always return false . This appears the reason why it is not possible to do anything else on the application until the previous action is done.
Instead, when closing the tab, the following code is executed: yarp/src/yarpmanager/src-manager/mainwindow.cpp Lines 795 to 864 in e4762b0
which calls yarp/src/yarpmanager/src-manager/applicationviewwidget.cpp Lines 2393 to 2395 in e4762b0
and then it calls yarp/src/yarpmanager/src-manager/safe_manager.cpp Lines 24 to 29 in e4762b0
This gets stuck in
since it is blocking until the separate thread finishes the run method.Since this is happening in the GUI thread, the whole application gets stuck and the OS tags it as "Not Responding" A possible ideaIn the end, the main issue is that the yarp/src/yarpmanager/src-manager/safe_manager.cpp Lines 95 to 381 in e4762b0
One possibility would be to use a flag (an atomic boolean for example) to interrupt prematurely the execution of the different for loops to make sure that no action causes the entire |
This is a well known behaviour but it is exremily difficult to improve. Checking if a connection is active or not, involves the establishment of multiple (4?) back and forth communications with the server ( the same of 'yarp ping' command') each of them requiring a ping time. So, if the network ping time is 100ms, each connection to verify might require 4x100ms. If you have 10 connections, these are 4 seconds of delay. No escape. The obvious solution is to keep your ping time to yarp server in the 10ms range or run the yarpserver elsewhere. Ps: fixing the network delay is your preferred option because a responsive yarp server with delayed communication is very bad too (and not equally detactable). |
Is it possible to prematurely interrupt a ping without waiting for the timeout time? |
Regarding the possibility of interrupting an ongoing operation, instead, I think that this is certainly doable (between two subsequent checks). |
Yeah exactly, that's what I had in mind. In principle, also in case of connection checks, the maximum waiting time should be 2 seconds, so it might still be a reasonable time to wait. |
No, this is not possible. |
I had a further look on this. The timeout value seems to be used here yarp/src/libYARP_os/src/yarp/os/Network.cpp Lines 753 to 766 in e4762b0
which sets an internal value that is get in yarp/src/libYARP_os/src/yarp/os/impl/SocketTwoWayStream.cpp Lines 48 to 51 in e4762b0
So in the end, the timeout value is given directly to ACE. By reading the documentation of This means that in principle it may be possible to use a fraction of the TIMEOUT value when calling |
Describe the bug
We have been largely exploiting
yarpmanager
to run demos involving several applications running on different PCs using different OS. So far, everything is great. Some difficulty arises whenyarpserver
is placed in a very delayed network. In this case, clicking the wrong button might render the system unusable. To give an example, if we press the refresh all button, it is not possible to do anything else in the application until all the applications, connections, and resources have been checked. The problem is that in case of a delayed network, this operation might require minutes. The problem gets worst if you try to close the application that got stuck as the entireyarpmanager
gets completely stuckTo Reproduce
One possible way to reproduce it is to set
yarp conf
to an IP not reachable, and then openyarpmanager
without runningyarpserver
. Then, open the application with the largest number of applications and press on therefresh
all button. Then, try to close the blocked applicationExpected behavior
When attempting to run another action while another one is running,
yarpmanager
should simply stop the previous one. At the same time, if the user wants to close the open application, it should close without blocking the entireyarpmanager
.Screenshots
In the following I just tried to open an application and then I try to close it immediately after. In this specific case, the issue is that
yarpserver
is not running, but the same happens whenyarpserver
is in a delayed networkYARP.module.manager.2024-04-16.18-26-33.mp4
Configuration (please complete the following information):
Additional context
Add any other context about the problem here.
cc @randaz81 @traversaro @Nicogene @GiulioRomualdi
The text was updated successfully, but these errors were encountered: