-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frequency drops with additional subscriber #461
Comments
Similar behavior was described here: https://answers.ros.org/question/363978/ros2-image-publisher-slows-down-on-subscription/ |
Hi! I also face the same problem on my robot (but it is not reproducible on my dev laptop though software configuration is the same) What networking interfaces do you have on the machine where the issue happens? Is the ROS2 running behind some of them? |
I'm also able to reproduce the issue with
But again that is only reproducible on the robot, but not locally.
That configuration is the same for the robot computer and the local one. Also, no XML config was used for dds. |
Anyone knows how to fix this issue? I tried https://docs.ros.org/en/rolling/How-To-Guides/DDS-tuning.html but did not work. |
Thank you for using My first assumption would always be what it says in https://docs.ros.org/en/rolling/How-To-Guides/DDS-tuning.html#cyclone-dds-tuning, because Cyclone defaults to requesting a 1MB socket receive buffer and so a 3MB size is not unlikely to overrun it. Of course a 500kB message would easily fit in 1MB of buffer space ... Another thing that comes to mind is the possibility that it is using multicast on your "normal" machine, but unicast on the robot. There are a number of possible reasons for that, one of them is using WiFi interface vs a wired network interface. Without having tried it out, it seems self-evident that sending the data multiple times will slow it down. If the publisher and the subscriber are both on the same machine, there is also the possibility that for one subscriber the kernel routes it over the loopback interface, but that from the second subscriber onwards it starts using multicast (if it deems the network interface suitable for that) and that this forces the process to slow down to the rate of the selected network interface. At least I suspect so ... The unicast/multicast decisions are easily seen in a wireshark capture, but at least for me it would be more convenient if you could grab the actual configuration of Cyclone DDS from the system and paste the relevant bits here. That you can do by:
and/or same for the subscriber. You'll see a list of all the settings (but given you're using default settings, that's probably not all that interesting) followed by a bit of information on network interfaces (the interesting part). E.g.:
(See the fourth line I quoted: "presumed flaky multicast". This is indeed my laptops WiFi.) There is a lot more to be found if you put in Secondly, From
(This is from a debug build on macOS.) The lines with "discarded", "rexmit", etc. say something about what it is doing. The CPU loads tell you how hard much of the CPU it occupies. Together it tells a bit more than the normal output. I'd be curious to see what it says. I don't know if these extra bits of information will be sufficient for figuring out the cause, but it is easy enough to be worth a try. |
I tried running with extended tracing ( My laptop's highest "quality" interface is wireless, so only spdp is enabled. But we have a bit trickier network setup on the robot (all connections are ethernet):
So the "best" interface on the robot's computer is wired so multicast is used at full potential Thank you very much for your help! The one thing I'm not sure about is how AllowMulticast == I started the software stack on the robot and I do not see any difference compared with full multicast, everything is working fine (including the second ROS2 device (stm) in the network). Maybe I'm just not fully understanding the purpose of multicast... What ROS-related features do I lose when using just spdp? |
It usually doesn't make a big difference, the What you do get is that cases where data from one writer needs to go to many readers in different processes (also on the same node), then you get many unicast messages instead of one multicast, and there is a cross-over point where the multicast really is better despite bothering some processes with data they don't need. That downside of sending data where it isn't needed can be pretty bad, and in the default configuration it is quite likely to happen for two reasons:
Back in the days that shared use of a network wasn't a thing, at least not in the world this protocol stack originated in. Sending it everywhere in practice also usually wasn't an issue, and a bit of configuration tweaking took care of the rest. There is a solution to these things if you do want to use multicast for some topics. Instead of setting |
I just tested it with normal multicast and spdp on our robot and the difference is quite significant. I think this is the issue. Running ddsperf in the earlier described configuration results in 754.97 Mb/s with spdp independent of the subscriber number. Using normal multicast results in a drop-down to 25.17 Mb/s for all participants if a second subscriber is created. All the traffic stays on the robot and a bridge interface is used as the network interface. Thanks for your troubleshooting everybody and sorry for the late answer from us at @bit-bots. |
Bug report
Required Info:
Steps to reproduce issue
Expected behavior
The frequency should not drop.
Actual behavior
The frequency drops.
The text was updated successfully, but these errors were encountered: