lib/route: add initial support for br_vlan global_opts module #407

ronand-atl · 2024-10-01T20:30:10Z

I am working on adding initial support for a new bridge vlan module for libnl in order to support RTM_{NEW,DEL,GET}VLAN messages. Our initial need is to be able to set bridge vlan global options, which are sent with the BRIDGE_VLANDB_GLOBAL_OPTIONS nested attribute.

I would like to design the initial support such that extending the module to support BRIDGE_VLANDB_ENTRY, GET and DEL requests, and caching is possible.

I am looking for some initial feedback on the best way to approach this from someone who is more familiar designing new modules. One thing I am unsure about is what level the libnl object for the bridge vlan should be defined at. Currently I have it so that there is one object for all bridge vlan info, but maybe there should be one just for the global options and one for the entries. Both share the same message type so I'm unsure if that might introduce challenges in implementing caching.

lib/route/br_vlan.c

thom311 · 2024-10-02T07:14:14Z

I am looking for some initial feedback on the best way to approach this from someone who is more familiar designing new modules. One thing I am unsure about is what level the libnl object for the bridge vlan should be defined at. Currently I have it so that there is one object for all bridge vlan info, but maybe there should be one just for the global options and one for the entries. Both share the same message type so I'm unsure if that might introduce challenges in implementing caching.

Sorry, I cannot competently comment on this question. It requires to well understand the kernel API, that is, how are those things exposed on the netlink API, so we can find which objects can best represent them.

Overall, the patch seems to be on the right track (I think!). Thank you.

KanjiMonster · 2024-10-11T13:40:11Z

FWIW, we have implemented the VLANDB part at https://github.com/bisdn/meta-bisdn-linux/blob/main/recipes-support/libnl/libnl/0008-bridge-vlan-add-per-vlan-stp-state-object-and-cache.patch, but I havent submitted that yet due to the global options part not being implemented.

Maybe we can find a way of combining the effort.

KanjiMonster · 2024-10-11T13:48:20Z

I am looking for some initial feedback on the best way to approach this from someone who is more familiar designing new modules. One thing I am unsure about is what level the libnl object for the bridge vlan should be defined at. Currently I have it so that there is one object for all bridge vlan info, but maybe there should be one just for the global options and one for the entries. Both share the same message type so I'm unsure if that might introduce challenges in implementing caching.

Sorry, I cannot competently comment on this question. It requires to well understand the kernel API, that is, how are those things exposed on the netlink API, so we can find which objects can best represent them.

Overall, the patch seems to be on the right track (I think!). Thank you.

This is a similar "issue" here that I also mentioned at #390 in that the same netlink group has two very different kind of objects it notifies about.

So for VLANDB there is the global options which exist once per vlan per bridge which has MSTI assignment and various options set for the whole vlan, and then there is are per-bridge member per-vlan options, like the VID, whether it is tagged, pvid, STP state, stats etc.

So in a way it does not make much sense to have both share a cache, since they have no overlap in information.

ronand-atl · 2024-10-14T04:50:26Z

@KanjiMonster Hey, thanks for reaching out :)

FWIW, we have implemented the VLANDB part at https://github.com/bisdn/meta-bisdn-linux/blob/main/recipes-support/libnl/libnl/0008-bridge-vlan-add-per-vlan-stp-state-object-and-cache.patch, but I havent submitted that yet due to the global options part not being implemented.

Maybe we can find a way of combining the effort.

Nice, luckily we didn't end up duplicating effort. I had a quick look over the patch linked and I noticed the build_bridge_vlan_msg function doesn't seem to do anything? Maybe that part still needs to be implemented.

This is a similar "issue" here that I also mentioned at #390 in that the same netlink group has two very different kind of objects it notifies about.

So for VLANDB there is the global options which exist once per vlan per bridge which has MSTI assignment and various options set for the whole vlan, and then there is are per-bridge member per-vlan options, like the VID, whether it is tagged, pvid, STP state, stats etc.

So in a way it does not make much sense to have both share a cache, since they have no overlap in information.

The global options seem to be conceptually different from the per vlan per port options, so I'm planning on having them in their own module route/br_vlan/global_opts.c. I'm working on a revised version which I should be able to push soon. If they are different modules then there shouldn't be too much overlap. If we want to implement caching then that looks like it might take extra work, demultiplexing nested attributes in messages sent from the kernel through a common br_vlan module, or having both modules process them and ignore nested attributes it doesn't care about.

If you're interested, we could set up a branch on a forked repo so we can work together on a more complete module that can be upstreamed all at once.

KanjiMonster · 2024-10-14T09:38:06Z

@KanjiMonster Hey, thanks for reaching out :)

FWIW, we have implemented the VLANDB part at https://github.com/bisdn/meta-bisdn-linux/blob/main/recipes-support/libnl/libnl/0008-bridge-vlan-add-per-vlan-stp-state-object-and-cache.patch, but I havent submitted that yet due to the global options part not being implemented.
Maybe we can find a way of combining the effort.

Nice, luckily we didn't end up duplicating effort. I had a quick look over the patch linked and I noticed the build_bridge_vlan_msg function doesn't seem to do anything? Maybe that part still needs to be implemented.

Right, we are only listening to netlink, not setting anything, so we only cared about getting notified about changes part (and being able to look stuff up).

For context, we are listing to changes in fdb/bridge/routes/etc to write the state/configuration down to a switch ASIC so these things get hardware accelerated in the background. A bit like switchdev, but in userspace.

This is a similar "issue" here that I also mentioned at #390 in that the same netlink group has two very different kind of objects it notifies about.
So for VLANDB there is the global options which exist once per vlan per bridge which has MSTI assignment and various options set for the whole vlan, and then there is are per-bridge member per-vlan options, like the VID, whether it is tagged, pvid, STP state, stats etc.
So in a way it does not make much sense to have both share a cache, since they have no overlap in information.

The global options seem to be conceptually different from the per vlan per port options, so I'm planning on having them in their own module route/br_vlan/global_opts.c. I'm working on a revised version which I should be able to push soon. If they are different modules then there shouldn't be too much overlap. If we want to implement caching then that looks like it might take extra work, demultiplexing nested attributes in messages sent from the kernel through a common br_vlan module, or having both modules process them and ignore nested attributes it doesn't care about.

But caching is the interesting part in libnl ;-). And the complicated one, since IIRC libnl only allows one module per message type, so we can't have two modules for RTM_NEWVLAN. So we would need to change libnl to allow that, but at the same time it would also benefit the mdb implementation.

And I just remembered my "programming is hard, let's go shopping" moment when I initially looked at implementing it:

The kernel does not send out a initial global opts message when a new vlan group is created (the first time a vlan is assigned to any bridge member), only if any global options are changed it sends out one. So you won't know that a vlan is assigned MSTI 0 unless you explicitly ask, or change any other option.

Likewise, the kernel does not send out a RTM_DELVLAN for global opts if a vlan group is removed (the last member of a vlan is removed). This resets all vlan global opts, but user space won't know unless userspace does reference tracking. Or tries to get the options for the vlan, which will then result in an error.

And lastly, kernel does not send out any messages to userspace about STP state changes for MSTI, or VLAN STP state changes as a result of them.

Maybe I should just send in kernel patches for these instead of trying to come up with a solution for this in userspace ... .

If you're interested, we could set up a branch on a forked repo so we can work together on a more complete module that can be upstreamed all at once.

I can't do any promises on how much time I can spend on it, but apart from that we can do that.

ronand-atl · 2024-10-30T23:25:26Z

I've pushed a new version of the code which finishes implementing this module. This module should work as is for allowing the user to modify and query global VLAN options of a bridge master interface. The non-global counterpart, if there is interest, would be implemented in a separate file, followed by caching support.

Some more testing will need to be done to ensure everything works as expected before merging.

KanjiMonster · 2024-10-31T10:30:11Z

I've pushed a new version of the code which finishes implementing this module. This module should work as is for allowing the user to modify and query global VLAN options of a bridge master interface. The non-global counterpart, if there is interest, would be implemented in a separate file, followed by caching support.

Some more testing will need to be done to ensure everything works as expected before merging.

How does cache update reporting work for those merged objects? Will the callback know what was merged, or will it need to do its own diff of the "old" and "new" objects?

The latter would be rather inefficient, and I would like to avoid that. Ideally there is one object per {vlan,port,bridge}.

ronand-atl · 2024-10-31T21:42:26Z

How does cache update reporting work for those merged objects? Will the callback know what was merged, or will it need to do its own diff of the "old" and "new" objects?

I'm not 100% sure what you mean. Could you elaborate? Are you talking about the oo_update function? Currently that's written to just handle the response of a dump request, and not update notifications. To handle update notifications it would need to merge the new entries into the existing object (adding or replacing entries). A two pointer strategy could be used do the operation in O(n + m) time without needing to modify the data structure.

Ideally there is one object per {vlan,port,bridge}.

I currently have one object per bridge master interface which has an entry for each vlan. This applies to the global options, but I'd expect the non global options to also have an object per bridge master or port with an entry per vlan.

KanjiMonster · 2024-11-01T09:40:05Z

I currently have one object per bridge master interface which has an entry for each vlan. This applies to the global options, but I'd expect the non global options to also have an object per bridge master or port with an entry per vlan.

This is the core issue, you can't have multiple object types in one cache. Either the cache has global vlan objects, or it has vlan entry objects. It cannot have both.

And you cannot have more than one cache per netlink message type. Once you register your global vlan options object type for RTM_{NEW,DEL}VLAN, you cannot add one for vlan entry objects, and vice versa.

My best Idea so far had been to have a "type" attribute for the object, which tells whether it is a global opts or a vlan entry type. So the id attributes become {ifindex,vid,type}, and depending on the attributes in the RTM_{NEW,DEL}VLAN you would update/create/remove different type of objects.

There is no issue in sending multiple object notifications from one message, libnl handles that fine.

Then maybe we also keep a reference of the vid entry in the gopts, as gopts live only as long as there are vlan members, once the last member is removed, the gopts get implicitly reset to defaults (but there is no explicit netlink message for that).

KanjiMonster · 2024-11-01T10:34:07Z

Though in general as a quick glance review for this PR:

I think there should be one gopts object per vlan per bridge. IIUC there are no global gopts that apply to all vlans, but each option is per vlan.
You are not registering for any netlink message types or any netlink message groups, so the message parser is never invoked. You are missing nl_object_ops::oo_msgtypes etc.

ronand-atl · 2024-11-06T05:17:14Z

Hmm yeah the caching issue is a tricky one. My initial idea with this PR was to implement the functionality without caching, and then add caching later, once it's figured out. I'm also mindful that the initial design must not do anything that could compromise efforts to add caching later.

I'm thinking some of the caching code could be changed to allow offloading the message handling to an intermediary module that reads each attribute in the message and calls the appropriate handler from either the gopts or the non-gopts module. This would solve this problem you mentioned: "And you cannot have more than one cache per netlink message type". I'll need to have a closer look at the code though to see if this is a viable solution.

In response to your review:

I think there should be one gopts object per vlan per bridge. IIUC there are no global gopts that apply to all vlans, but each option is per vlan.

Perhaps, I'm not sure. Is there a benefit to doing that? Wouldn't that make it more annoying to get and update entries for a single interface since now there's potentially many more entries that aren't grouped or ordered in any way?

You are not registering for any netlink message types or any netlink message groups, so the message parser is never invoked. You are missing nl_object_ops::oo_msgtypes etc.

The message parser does get invoked when requesting the gopts entries for a bridge master with rtnl_br_vlan_gopts_get_kernel. As we don't know about the best way to implement caching, there isn't any code that registers a cache yet.

thom311 · 2024-11-12T09:25:57Z

include/netlink/list.h

+static inline void nl_list_join(struct nl_list_head *head_1,
+				struct nl_list_head *head_2)
+{
+	nl_list_insert_list_after(head_2, head_1->prev);


Does this code come from linux kernel headers (and do they call the functionality this way)? If kernel does it, that would be a strong reason to name them the same (and basically implement them the same). I didn't check. The commit message should explain how this new API relates to the list API in kernel (and optimally, they are obviously related and similar).

With these linked lists, often the head is a special object and not a regular entry. That is, often we could not meaningfully call nl_list_entry(head, Node, lst_node). It should be clear what happens if head_1 or head_2 point to such a special head element (or not).

for example, we could define that head_1 can be either a special element or not (it doesn't matter). But that head_2 must point to a special head element which afterwards will be empty (and the content of list 2 moved over to head_1. Alternatively, we could define that head_2 points to a regular element, and the entire list 2 will be joined. That is, list 2 is never empty, it at least contains head_2 element. In the latter case, `head_2 would not be empty afterwards.

This should be defined (by having a code comment that explains how to use this). Optimally, we would also have unit tests, since it's not at all obvious whether this implementation is correct (which it probably is).

The same applies to nl_list_insert_list_after(). Also, don't do nl_list_join and nl_list_insert_list_after something similar? Couldn't nl_list_insert_list_after() be named something like nl_list_join_front() (am not not sure exactly, since I don't fully understand what they do(*))? However, their name is rather different. There should be code comment that explains the difference (and also makes it clear why they are named differently).

(*) yes, I could spend some time trying to understand what they do, and then reason backwards whether the name makes sense. However, the name should make it clear upfront, and optimally a code comment explains the non-obvious details beyond a good name. Then I'd be more inclined to reason whether the implementation actually implements what the name+doc indicates.

if the code is copied from linux kernel, then we must take care to not use invalid licenses (libnl is LGPL2.1-only).

Granted, those functions seem so minimal, I don't think you can implement them any other way.

If you want to get inspired, then compare to https://github.com/c-util/c-list/blob/main/src/c-list.h which has a compatible license. The downside is, we really want that the API looks similar to what is in kernel, so that somebody familiar with kernel API understand it.

Use your best judgment :)

I would merge this patch early, even if no users are merged yet. I think it's generally useful API beyond this PR.

I agree with all your points. I myself found it hard to understand how the list module worked since it had no documentation. It also didn't help that I'd never seen that type of implementation of a linked list before. I'll make a separate PR adds the extra methods and documents the module. I can add some unit tests as well if you could give me some pointers on how to do that.

Does this code come from linux kernel headers

No it doesn't. I hadn't looked at the code in the Linux kernel for their list implementation until now. I wrote the code based on a method I worked out on paper. The closest thing the kernel has seems to be list_splice_tail_init in include/linux/list.h. You seem to be able to use that function to do the same thing. It just has the parameters swapped around I think.

I myself found it hard to understand how the list module worked since it had no documentation.

With API like nl_list_add_tail() it's mostly clear what the function is going to do. On the other hand, with nl_list_join() there are more details that are not obvious.

I can add some unit tests as well if you could give me some pointers on how to do that.

In this case, unit tests would go to tests/check-direct.c. This test statically links the libraries and runs without a separate namespace, so it's suited for testing "pure" internal code.

The closest thing the kernel has [...]. It just has the parameters swapped around I think.

That might be an argument for following kernel names and API. On the other hand, there is the potential licensing concern (??) and maybe c-list.h (below) would still be nicer.

I wrote the code based on a method I worked out on paper.

That's great :)

I am quite a fan of https://github.com/c-util/c-list/blob/main/src/c-list.h

It's 2024 and I wonder, whether we shouldn't re-use an existing, solid implementation. Granted, a circular, intrusive linked list is relatively easy to reimplement. But should we? In NetworkManager, c-list is used via git-subtree. Since it's header-only, it's easy to use.

Import with

git subtree add --prefix shared/c-list/ [email protected]:c-util/c-list.git main --squash

upgrade (if ever necessary) with

git subtree pull --prefix shared/c-list/ [email protected]:c-util/c-list.git main --squash

and then add shared/c-list/src/ to Makefile.am as

default_includes = \ »·······-I$(srcdir)/include/linux-private \

I would do that. WDYT?

I've checked and the library provides a means to do everything we can currently do with our implementation, except it it doesn't have reverse iteration macros. Those would be easy enough to add, but that might mean the code would have to copied over instead of using a subtree (or another file includes that file to add to it).

I think the point would be that we submit such features to c-list "upstream", and just do a git subtree pull. That requires that upstream is susceptible to such patches. I think upstream is, otherwise there would be a problem with pulling it.

Thereby others could potentially benefit to that contribution, not just some obscure part inside libnl3.

Of course it will still take some work to convert the existing code to use the new interface.

I don't think we need to rework existing code (unless somebody wants to send patches). It could be used only for new code.

how about #412 ?

Looks good. I'll also make a PR that adds reverse iterators to that library. I've added the code; I've just got to adapt some of their tests. I'll probably submit that tomorrow.

Here's the PR I made, for reference: c-util/c-list#14

lgtm.

In the meantime, for your WIP branch here, I think you can do:

git subtree pull --prefix third_party/c-list/ [email protected]:c-util/c-list.git refs/pull/14/head --squash

[[email protected]: Taken from [1].] [1] #407

include/netlink/netlink.h

libnl-route-3.sym

The following methods are added: - nl_list_insert_before - nl_list_insert_after - nl_list_insert_list_after - nl_list_join - nl_list_last_entry - nl_list_for_each_entry_reverse - nl_list_for_each_entry_safe_reverse These methods are added to support an upcoming patch implementing bridge vlan support which requires additional list operations to be available.

In order to support an upcoming patch implementing support for requesting bridge vlan information for a single interface, a modification to the pickup code is needed to allow handling of multipart messages. Based on the implementation of the bridge vlan dumping code in the kernel, information relating to a single interface may not all be contained within a single message. If the maximum message size is exceeded before all entries can be filled in, a new message will be started with the same ifindex but containing the remaining entries. The code is currently only designed for handling netlink responses where the provided parser function will be called a maximum of one time. This is based on the assumption that all data relating to an object will be contained within a single response message. If multiple messages are received and the parser function is invoked multiple times, previously stored objects will be leaked and the resulting object will only contain information from the last parsed message. In order to handle the case where multiple messages need to be parsed to construct a single object, use the already existing oo_update function that may be defined for an object. The function will be called to merge partial objects (that are the result of each invocation of the parser function) together into a single object that is returned at the end.

Add br_vlan global_opts module to support setting global VLAN options on bridge master interfaces. There is currently no support for cache operations and the non-global options counterpart is not implemented.

mcr · 2024-11-13T18:15:01Z

looks like a clone of BSD sys/queue.h

ronand-atl · 2024-11-14T05:21:08Z

@mcr Were you referring to this one? https://github.com/freebsd/freebsd-src/blob/main/sys/sys/queue.h
The code doesn't look too similar.

KanjiMonster · 2024-11-18T10:47:16Z

In response to your review:

I think there should be one gopts object per vlan per bridge. IIUC there are no global gopts that apply to all vlans, but each option is per vlan.

Perhaps, I'm not sure. Is there a benefit to doing that? Wouldn't that make it more annoying to get and update entries for a single interface since now there's potentially many more entries that aren't grouped or ordered in any way?

One benefit is that look ups for specific vlan's options is quicker, since you can make use of the cache (and hashing). And you can make use of libnl filtering (e.g. "give me all vlan opts with msti set to X"). No way to do that if all vlans are part of the same object.

Since these are global options, there will only be one object per bridge, and in most scenarios, there will only be one bridge. So in practice, having all vlans in one object means there is only one object, so the cache becomes meaningless. And if you want to look up any info, you need to iterate through the linked list(s).

So from a consumer perspective having one object per vlan is much easier to work with.

ronand-atl · 2024-11-20T04:33:28Z

@KanjiMonster I see. I'll see if I can come up with a revised interface for the module then. Hopefully most of my code is reusable.

thom311 reviewed Oct 2, 2024

View reviewed changes

lib/route/br_vlan.c Outdated Show resolved Hide resolved

lib/route/br_vlan.c Outdated Show resolved Hide resolved

ronand-atl mentioned this pull request Oct 7, 2024

lib/route: add support for bridge msti #409

Closed

thom311 added the change-requested label Oct 30, 2024

ronand-atl force-pushed the feat-br-vlan-global-opts branch from f556d1e to e33b8bc Compare October 30, 2024 23:07

ronand-atl changed the title ~~lib/route: add initial support for br_vlan module (WIP)~~ lib/route: add initial support for br_vlan global_opts module Oct 30, 2024

ronand-atl force-pushed the feat-br-vlan-global-opts branch from e33b8bc to d9c1289 Compare October 31, 2024 00:23

thom311 reviewed Nov 12, 2024

View reviewed changes

thom311 pushed a commit that referenced this pull request Nov 12, 2024

maint: fix inconsistent formatting in netlink.h

bdfdf73

[[email protected]: Taken from [1].] [1] #407

thom311 reviewed Nov 12, 2024

View reviewed changes

include/netlink/netlink.h Outdated Show resolved Hide resolved

thom311 reviewed Nov 12, 2024

View reviewed changes

libnl-route-3.sym Show resolved Hide resolved

ronand-atl added 3 commits November 13, 2024 11:23

lib/route: add initial support for br_vlan module

66659a1

Add br_vlan global_opts module to support setting global VLAN options on bridge master interfaces. There is currently no support for cache operations and the non-global options counterpart is not implemented.

ronand-atl force-pushed the feat-br-vlan-global-opts branch from d9c1289 to 66659a1 Compare November 12, 2024 22:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lib/route: add initial support for br_vlan global_opts module #407

lib/route: add initial support for br_vlan global_opts module #407

ronand-atl commented Oct 1, 2024

thom311 commented Oct 2, 2024

KanjiMonster commented Oct 11, 2024

KanjiMonster commented Oct 11, 2024

ronand-atl commented Oct 14, 2024

KanjiMonster commented Oct 14, 2024

ronand-atl commented Oct 30, 2024

KanjiMonster commented Oct 31, 2024

ronand-atl commented Oct 31, 2024

KanjiMonster commented Nov 1, 2024

KanjiMonster commented Nov 1, 2024

ronand-atl commented Nov 6, 2024

thom311 Nov 12, 2024 •

edited

Loading

thom311 Nov 12, 2024

thom311 Nov 12, 2024

ronand-atl Nov 13, 2024

thom311 Nov 13, 2024

thom311 Nov 14, 2024

thom311 Nov 14, 2024

ronand-atl Nov 18, 2024

ronand-atl Nov 21, 2024

thom311 Nov 21, 2024

mcr commented Nov 13, 2024 via email

ronand-atl commented Nov 14, 2024

KanjiMonster commented Nov 18, 2024

ronand-atl commented Nov 20, 2024

lib/route: add initial support for br_vlan global_opts module #407

Are you sure you want to change the base?

lib/route: add initial support for br_vlan global_opts module #407

Conversation

ronand-atl commented Oct 1, 2024

thom311 commented Oct 2, 2024

KanjiMonster commented Oct 11, 2024

KanjiMonster commented Oct 11, 2024

ronand-atl commented Oct 14, 2024

KanjiMonster commented Oct 14, 2024

ronand-atl commented Oct 30, 2024

KanjiMonster commented Oct 31, 2024

ronand-atl commented Oct 31, 2024

KanjiMonster commented Nov 1, 2024

KanjiMonster commented Nov 1, 2024

ronand-atl commented Nov 6, 2024

thom311 Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcr commented Nov 13, 2024 via email

ronand-atl commented Nov 14, 2024

KanjiMonster commented Nov 18, 2024

ronand-atl commented Nov 20, 2024

thom311 Nov 12, 2024 •

edited

Loading