Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race in caching behaviour when smcroute is used to combine multicast feeds #143

Open
gsmecher opened this issue Jan 28, 2020 · 7 comments
Open

Comments

@gsmecher
Copy link

I'm using smcroute on a number of embedded boards, each of which forwards multicast data from a private source interface to a single multicast destination on a shared subnet.

Because linux + smcrouted results in IFF_ALLMULTI being set on all impacted interfaces, this means smcrouted sees multicast packets from the shared subnet (the destination, which we don't really want) as well as the source. This sets up a race between receiving a "correct" multicast packet (from our private interface) and an "incorrect" packet (packet forwarded by another board on the public interface).

If the first packet comes from the public interface, mroute4_dyn_add adds a poison-pill route so such packets are not forwarded in the future. As a result, we never forward any multicast traffic.

If the first packet comes from the private interface, a correct cache entry is entered and we do forward traffic.

I also can't use "phyint eth0 disable" on the external interface since this also prevents forwarding.

I am unsure if this is a problem with our network topology or a corner-case for smcrouted. In a perfect world, I'd avoid IFF_ALLMULTI on the external interface (I don't want the CPU to see extra traffic on the external interface anyway!)

#27 would be perfect, although that report was a performance optimization and this feels more like a bug.

@troglobit
Copy link
Owner

troglobit commented Jan 28, 2020

Hi, always great to hear from users of SMCRoute :)

The support for (*,G) routing in SMCRoute is crude and does not work for all possible use-cases. Yours is one such case, as you've discovered the hard way. The (*,G) mechanism has actually been grafted from mrouted, and is known in the kernel as IGMPMSG_NOCACHE, but it does not handle the case when the group suddenly arrives on a different interface. The latter is the kernel message IGMPMSG_WRONGVIF. Adding support for this is quite a bit of work, and would move SMCRoute significantly further away from its roots as a static multicast routing daemon.

The IFF_ALLMULTI flag you mention is actually a kernel thing. All user space multicast routing daemons only open the kernel multicast routing socket, send MRT_INIT and then start enumerating interfaces intended for forwarding. These are called Virtual Interfaces, or VIFs. When an interface is enumerated Linux sets the IFF_ALLMULTI flag ... there is much to be said about this.

As a way forward I'd start with;

  1. Test if you can add some glue script to watch for "poison routes" and then call smcoutectl flush, it should work ...
  2. Sit down and have a look at the IGMPMSG_WRONGVIF code path. A workaround, possibly initiating flush, should be possible.
  3. Maybe set up an iptables drop rule in the prerouting chain on the public interfce?

Edit: backtick fixups to (*,G) to prevent italic markup in the paragraph.

@gsmecher
Copy link
Author

My short-term path forwards is to re-break the IFF_MULTI support in my kernel. Our older kernel's Ethernet driver ignored the flag, and we lived blissfully in ignorance. We upgraded our kernel and everything fell apart. Breaking the driver again is simple, and ensures we maintain hardware packet filtering.

I forgot to add the first and last part of the bug report: we've gotten a long way with smcrouted, and will continue to. Thank you, thank you, thank you. (You may be happy to hear that smcroute handles readout packets for the South Pole Telescope in Antarctica.)

@troglobit
Copy link
Owner

Sounds like a plan! Having IFF_MULTI optional for userspace when multicast routing is enabled is sort of the patch I wanted to submit upstream, but never got around to ... and then the original patch got lost along the way.

Really glad to see the work I've put in pay off for others as well. It's really awesome to hear stories like this! :-)

troglobit added a commit that referenced this issue Aug 9, 2021
This test verifies that stop-filter, or "poison pill", routes work as
intended.  I.e., unknown inbound multicast is blocked and only such
that is actually wanted is properly routed.

Example:

    mroute from eth0 group 225.1.2.3 to eth2
    mroute from eth1 group 225.1.2.3 to eth2

These two multicast routes are dynamically installed in the kernel MFC
when any source originating from eth0 or eth1 with a matching group is
interecpted by the kernel and smcrouted is notified (NOCACHE msg).

If multicast to the same multicast group comes in on eth2, smcrouted
adds a stop-filter route, i.e., a route with no outbound interfaces, to
prevent further NOCACHE upcall messages from the kernel.  This does not
affect any flow in the intended direction, esatablished before or after
the stop-filter is created.

However, should source `S` from eth0 suddenly appear on eth2, this would
be considered a WRONGVIF event, which is not handled by smcrouted.  This
typically occurs when a layer-3 topolgy change is taking place.  For the
moment, users are recommended to either try pimd/pim6sd, or add a switch
with multicast snooping between eth2 and the rest of the network.  When
a snooping switch is active it only forwards multicast to eth2 when it
has sent a join (mgroup) for a given (source and) multicast group.

Issue #143

Signed-off-by: Joachim Wiberg <[email protected]>
@troglobit
Copy link
Owner

I've been coming back to this issue a couple of time of the last 1+ year. It's annoying, to say the least. After a couple of weeks of refactoring the code base this summer (to better support IPv6 on an equal footing), I've also added a few tests to verify the behavior of smcrouted.

If you're still out there @gsmecher, could you perhaps confirm that when you say "... this means smcrouted sees multicast packets from the shared subnet (the destination, which we don't really want) as well as the source.", you mean the packets from the shared subnet are looped back, i.e., the same source address? Because if it's not the same (S,G) pair being looped back, the new poison.sh test should prove that smcrouted can at least handle the basic case:

Example:

mroute from eth0 group 225.1.2.3 to eth2
mroute from eth1 group 225.1.2.3 to eth2

These two multicast routes are dynamically installed in the kernel MFC when any source originating from eth0 or eth1 with a matching group is intercepted by the kernel and smcrouted is notified (NOCACHE msg).

If multicast to the same multicast group comes in on eth2, smcrouted adds a stop-filter route, i.e., a route with no outbound interfaces, to prevent further NOCACHE upcall messages from the kernel. This does not affect any flow in the intended direction, established before or after the stop-filter is created.


I'm still pondering how to handle the WRONGVIF case, where the same (S,G) pair suddenly arrives on another interface. I'll likely put that on the TODO list and try to get the v2.5 release out.

@gsmecher
Copy link
Author

gsmecher commented Aug 9, 2021

Hi Joachim,

Yup, I'm still out here! I'm a multicast user, not a multicast expert, so my understanding here is limited. I'll describe our topology, then explain the problems we saw.

We have 2 subnets:

  • 192.168.154.x/24: a private link between a single FPGA and a single ARM SoC on each of ~100 embedded boards.
  • 192.168.0.x/24: a shared subnet. This is where multicast clients need to receive data from all boards.

Each ARM SoC bridges these networks using smcroute. Packets are received on the private (.154.x) link and forwarded to the shared (.0.x) subnet.

Nothing in the current stack rewrites SRCIP fields when forwarding packets from the private subnet to the multicast address on the shared subnet. Hence, we do have colliding (S, G) pairs and we're in your "WRONGVIF case". (I just checked, and wireshark shows SRCIP 192.168.154.2 on the shared subnet.) This seems wrong for us, and does cause downstream problems with "martian packet" filters. (Aside: whose job would rewriting SRCIP? I can probably manage it with iptables, but I would have expected it to be within the forwarding daemon's scope.)

(edit: on reflection, I think I'm describing something like masquerading and not ordinary forwarding/routing.)

Our (abominable) workaround is doing its job, so we're in no rush for a fix. If it is helpful, I'll happily be a test subject. One comment: it's important (to us, because our SoC is limited and currently does hardware packet filtering) that the network stack does not inspect all multicast traffic on the shared interface.

thanks again,
Graeme

@troglobit
Copy link
Owner

Hi Graeme, great to hear you're still with us! :)

Thank you for the detailed explanation, I understand much better now what's going on. Your use-case is remarkably similar to what we do at Westermo for some customers. E.g., onboard trains, each train car (or consist) is a 10.0.0.0/24 subnet where there is a lot of multicast senders transmitting to a shared LAN, the train backbone. Which is 192.168.x.0/18 or something. Each train car is 1:1 NAT:ed and has a unique train-wide address in the 192.168 range.

So, I'll whip up a test case for this in SMCRoute and document it as well as I can.

You're right, SMCRoute is a very simple program, it acts as a very basic user interface to the kernel MROUTING stack, which acts as a forwarder of frames, nothing more. To do more interesting stuff you can have a look at the iptables 1:1 NAT, called netmap, or use masquerading. I think that should also solve your other issues. I'll make sure to include this in the test case as well.

While googling this yesterday I ran into this fellow, which seems to have pretty much the same use case as well. I'll make sure to respond to him too as soon as I have something to show for.

https://serverfault.com/questions/1069555/iptables-netmap-not-reliably-adjusting-source-address-of-multicast-udp-packets

@troglobit
Copy link
Owner

OK, so I've made a little test now to both verify the functionality and demonstrate how to use 1:1 NAT with SMCRoute. It uses network namespaces, VETH pairs, a bridge, and a couple of dummy interfaces as its basic building blocks. SMCRoute and iptables does the rest.

https://github.com/troglobit/smcroute/blob/master/test/multi.sh

         netns: R1                         netns: R2
        .-------------.                   .-------------.
        |  smcrouted  |                   |  smcrouted  |
        |    /   \    |       br0         |    /   \    |
   MC --> eth1   eth0 |      /   \        | eth0   eth1 <-- MC
        |            `------'     '--------'            |
        '-------------'  192.168.0.0/24   '-------------'
          10.0.0.0/24                       10.0.0.0/24

Each network namespace has two interfaces, eth1 is the dummy interface and eth0 is one end of a VETH pair, where the other end is bridged in the root namespace, in br0.

netns interface address
R1 eth0 192.168.0.10
R1 eth1 10.0.0.01
R2 eth0 192.168.0.20
R2 eth1 10.0.0.01

With iptables we set up NETMAP to translate the private 10.0.0.0/24 network on the left to a global 192.168.10.0/24 and the right to a global 192.168.20.0/24.

With SMCRoute we set up a multicast (*,G) route (the config file is shared between the two smcrouted instances) from eth1 to eth0.

Two multicast streams are injected on eth1 in each netns. We use different TTLs (bigger than 2) to both differentiate between the streams when we analyze the pcap file collected on br0, and to be able to check for multicast routing loops (due to the 1:1 NAT).


None of this fixes your root problem, which we discussed before, related to inbound traffic from your shared subnet. So I guess your workaround is the way to go for your, for now. Just FYI, there has been a lot of work on the tc subsystem in the kernel, including loading eBPF filters to bypass the complete routing stack right after the NIC driver has received the frame. Even offloading certain tc filters to hardware, like TCAMs. All of which are promising, but outside the scope of SMCRoute.

I really hope I've got the full picture and this helps you going forward!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants