Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Offboard Timeout #741

Open
kripper opened this issue Dec 8, 2024 · 14 comments
Open

[Feature Request] Offboard Timeout #741

kripper opened this issue Dec 8, 2024 · 14 comments

Comments

@kripper
Copy link

kripper commented Dec 8, 2024

When we use offboard.set_velocity_body() and our python script crashes, the mavsdk server continues to send MAVLINK messages to the drone.

This is not safe.

Please consider setting a timeout to automatically do a system.offboard.stop().
If a new action is performed, the timer should reset.

@JonasVautherin
Copy link
Collaborator

Do you run the embedded mavsdk_server or do you start it manually?

@kripper
Copy link
Author

kripper commented Dec 8, 2024

Embedded and started automagically by the Python wrapper.

@JonasVautherin
Copy link
Collaborator

But then if the python process crashes, it should kill mavsdk_server 🤔. How can it keep running if the process crashes?

@kripper
Copy link
Author

kripper commented Dec 8, 2024

The python code crashed with a python exception, but the server kept running and sending Mavlink messages to the drone, like a stone on the car's gas pedal.

@JonasVautherin
Copy link
Collaborator

This is the magic that starts the embedded mavsdk_server. It starts it as a subprocess, so if the python process crashes, it will stop mavsdk_server.

Now if the python process does not crash, but instead the code doesn't do what it's supposed to do and it results in the offboard commands not stopping, I am not sure how well we can solve that 🤔.

I mean even with DroneKit, you could have a bug where your code keeps sending offboard commands even though it is not what the author of the code expects, right?

@kripper
Copy link
Author

kripper commented Dec 8, 2024

This is the magic that starts the embedded mavsdk_server. It starts it as a subprocess, so if the python process crashes, it will stop mavsdk_server.

I see. I believe we had other tasks running, which prevented the Python code from stopping the mavsdk_server.
We solved the problem by using a timer to call offboard.stop() when no other actions were performed for more than 0.5 seconds.

Maybe the same watchdog timeout logic could be implemented on the server side and enabled/disabled via python.

Do you see any reason why not enabling it be default?

@JonasVautherin
Copy link
Collaborator

JonasVautherin commented Dec 9, 2024

I am not completely convinced that the watchdog will always do what users expect 🤔. What if some client genuinely doesn't perform any action for more than 0.5 second? Also I am not sure it will make it a lot safer: I can still imagine bugs in the client code that will make offboard do wrong things 😕.

@julianoes what do you think?

@kripper
Copy link
Author

kripper commented Dec 9, 2024

The watchdog will basically remove the stone from the gas pedal when it detects the pilot is not responding anymore.

DJI drones require to send velocity command within 50hz, otherwise the action is ignored. And this cannot be disabled.

@JonasVautherin
Copy link
Collaborator

when it detects the pilot is not responding anymore

Wait, are you using offboard for the manual input (i.e. sticks) from the pilot? Shouldn't you use the corresponding MAVLink messages for that? It already exists in MAVSDK, that's the manual_control plugin (available in Python).

I think this will solve your issue 👍.

@kripper
Copy link
Author

kripper commented Dec 9, 2024

Yes. RosettaDrone is like a FlightController that converts all Mavlink commands to joystick commands on cheap drones that only support that...but this is how it works under the hood.

In general, we want to support all Mavlink messages/apps. In this case, I was translating a Dronekit script and noticed this (what I consider) dangerous behavior.

@kripper
Copy link
Author

kripper commented Dec 9, 2024

that's the manual_control

Yes. That is very similar to what DJI is doing in terms of handling a timeout or watchdog.

I understand your point is that set_velocity_body() is for setting a persistent setpoint.

If the pilot is a human (joystick) or a robot (set_velocity_body), in both cases there is the risk of a communication breakdown between components (between the wrapper and the server).

Maybe the desired behavior for a plane is to continue flying with same velocity, while for a drone it is to stop (because it can and it's safer).

Anyway, I wouldn't mind if the plane reduces velocity, keeps altitude and/or loiters when there is no communication with the pilot or autopilot.

@JonasVautherin
Copy link
Collaborator

in both cases there is the risk of a communication breakdown between components (between the wrapper and the server).

But that is not what happened in your case, right? In your case, your python process did not crash (otherwise mavsdk_server would have been stopped). Rather your code "hanged" in a situation where it shouldn't have.

To me it's pretty similar to some code that would do:

keep_sending_setpoints = true

while (keep_sending_setpoints) {
    <use offboard to send setpoints>
}

if (user_hit_stop) {
    keep_sending_setpoints = false
}

You can see that the while loop will never stop, because the code is wrong. But we can't really have a watchdog that prevents that, can we?

You suggested having some kind of watchdog for your specific code error ("stop after 0.5s"), but that will not work for every possible use-case of the offboard code.

But if, instead of using offboard, you use manual_control, then I think it will behave the way you want! I would recommend against using offboard (which is dangerous) when what you actually want is manual_control 🤔.

@kripper
Copy link
Author

kripper commented Dec 9, 2024

In my case, the pilot routine hung because it generated an Exception and there were active tasks running in other threads => the script didn't exit => the MAVSDK server kept running => the drone continued flying out of control.

The same would happen if you mistakenly pause the pilot routine with a sleep(100).

I implemented a timer to stop the motion when no new actions are performed (on the python side).
The same could be implemented in the MAVSDK server side.

@julianoes
Copy link
Collaborator

I think it's up to the MAVSDK user to make sure exception are handled properly and you don't end up in the situation you describe. I understand that's not necessarily easy, especially with Python asyncio, so I get where you're coming from.

I don't think that we ought to build such safe guards around the gRPC API because we'd have to do it everywhere. And if we only added it here, that wouldn't be very consistent, I think.

One more thought though, we could change the MAVSDK offboard API to require you to constantly send setpoints and not do that automatically. That would also make it consistent with the manual control API, and remove the start/stop edge cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants