Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib/route: add initial support for br_vlan global_opts module #407

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ronand-atl
Copy link
Contributor

I am working on adding initial support for a new bridge vlan module for libnl in order to support RTM_{NEW,DEL,GET}VLAN messages. Our initial need is to be able to set bridge vlan global options, which are sent with the BRIDGE_VLANDB_GLOBAL_OPTIONS nested attribute.

I would like to design the initial support such that extending the module to support BRIDGE_VLANDB_ENTRY, GET and DEL requests, and caching is possible.

I am looking for some initial feedback on the best way to approach this from someone who is more familiar designing new modules. One thing I am unsure about is what level the libnl object for the bridge vlan should be defined at. Currently I have it so that there is one object for all bridge vlan info, but maybe there should be one just for the global options and one for the entries. Both share the same message type so I'm unsure if that might introduce challenges in implementing caching.

lib/route/br_vlan.c Outdated Show resolved Hide resolved
lib/route/br_vlan.c Outdated Show resolved Hide resolved
@thom311
Copy link
Owner

thom311 commented Oct 2, 2024

I am looking for some initial feedback on the best way to approach this from someone who is more familiar designing new modules. One thing I am unsure about is what level the libnl object for the bridge vlan should be defined at. Currently I have it so that there is one object for all bridge vlan info, but maybe there should be one just for the global options and one for the entries. Both share the same message type so I'm unsure if that might introduce challenges in implementing caching.

Sorry, I cannot competently comment on this question. It requires to well understand the kernel API, that is, how are those things exposed on the netlink API, so we can find which objects can best represent them.

Overall, the patch seems to be on the right track (I think!). Thank you.

@KanjiMonster
Copy link
Contributor

FWIW, we have implemented the VLANDB part at https://github.com/bisdn/meta-bisdn-linux/blob/main/recipes-support/libnl/libnl/0008-bridge-vlan-add-per-vlan-stp-state-object-and-cache.patch, but I havent submitted that yet due to the global options part not being implemented.

Maybe we can find a way of combining the effort.

@KanjiMonster
Copy link
Contributor

I am looking for some initial feedback on the best way to approach this from someone who is more familiar designing new modules. One thing I am unsure about is what level the libnl object for the bridge vlan should be defined at. Currently I have it so that there is one object for all bridge vlan info, but maybe there should be one just for the global options and one for the entries. Both share the same message type so I'm unsure if that might introduce challenges in implementing caching.

Sorry, I cannot competently comment on this question. It requires to well understand the kernel API, that is, how are those things exposed on the netlink API, so we can find which objects can best represent them.

Overall, the patch seems to be on the right track (I think!). Thank you.

This is a similar "issue" here that I also mentioned at #390 in that the same netlink group has two very different kind of objects it notifies about.

So for VLANDB there is the global options which exist once per vlan per bridge which has MSTI assignment and various options set for the whole vlan, and then there is are per-bridge member per-vlan options, like the VID, whether it is tagged, pvid, STP state, stats etc.

So in a way it does not make much sense to have both share a cache, since they have no overlap in information.

@ronand-atl
Copy link
Contributor Author

@KanjiMonster Hey, thanks for reaching out :)

FWIW, we have implemented the VLANDB part at https://github.com/bisdn/meta-bisdn-linux/blob/main/recipes-support/libnl/libnl/0008-bridge-vlan-add-per-vlan-stp-state-object-and-cache.patch, but I havent submitted that yet due to the global options part not being implemented.

Maybe we can find a way of combining the effort.

Nice, luckily we didn't end up duplicating effort. I had a quick look over the patch linked and I noticed the build_bridge_vlan_msg function doesn't seem to do anything? Maybe that part still needs to be implemented.

This is a similar "issue" here that I also mentioned at #390 in that the same netlink group has two very different kind of objects it notifies about.

So for VLANDB there is the global options which exist once per vlan per bridge which has MSTI assignment and various options set for the whole vlan, and then there is are per-bridge member per-vlan options, like the VID, whether it is tagged, pvid, STP state, stats etc.

So in a way it does not make much sense to have both share a cache, since they have no overlap in information.

The global options seem to be conceptually different from the per vlan per port options, so I'm planning on having them in their own module route/br_vlan/global_opts.c. I'm working on a revised version which I should be able to push soon. If they are different modules then there shouldn't be too much overlap. If we want to implement caching then that looks like it might take extra work, demultiplexing nested attributes in messages sent from the kernel through a common br_vlan module, or having both modules process them and ignore nested attributes it doesn't care about.

If you're interested, we could set up a branch on a forked repo so we can work together on a more complete module that can be upstreamed all at once.

@KanjiMonster
Copy link
Contributor

@KanjiMonster Hey, thanks for reaching out :)

FWIW, we have implemented the VLANDB part at https://github.com/bisdn/meta-bisdn-linux/blob/main/recipes-support/libnl/libnl/0008-bridge-vlan-add-per-vlan-stp-state-object-and-cache.patch, but I havent submitted that yet due to the global options part not being implemented.
Maybe we can find a way of combining the effort.

Nice, luckily we didn't end up duplicating effort. I had a quick look over the patch linked and I noticed the build_bridge_vlan_msg function doesn't seem to do anything? Maybe that part still needs to be implemented.

Right, we are only listening to netlink, not setting anything, so we only cared about getting notified about changes part (and being able to look stuff up).

For context, we are listing to changes in fdb/bridge/routes/etc to write the state/configuration down to a switch ASIC so these things get hardware accelerated in the background. A bit like switchdev, but in userspace.

This is a similar "issue" here that I also mentioned at #390 in that the same netlink group has two very different kind of objects it notifies about.
So for VLANDB there is the global options which exist once per vlan per bridge which has MSTI assignment and various options set for the whole vlan, and then there is are per-bridge member per-vlan options, like the VID, whether it is tagged, pvid, STP state, stats etc.
So in a way it does not make much sense to have both share a cache, since they have no overlap in information.

The global options seem to be conceptually different from the per vlan per port options, so I'm planning on having them in their own module route/br_vlan/global_opts.c. I'm working on a revised version which I should be able to push soon. If they are different modules then there shouldn't be too much overlap. If we want to implement caching then that looks like it might take extra work, demultiplexing nested attributes in messages sent from the kernel through a common br_vlan module, or having both modules process them and ignore nested attributes it doesn't care about.

But caching is the interesting part in libnl ;-). And the complicated one, since IIRC libnl only allows one module per message type, so we can't have two modules for RTM_NEWVLAN. So we would need to change libnl to allow that, but at the same time it would also benefit the mdb implementation.

And I just remembered my "programming is hard, let's go shopping" moment when I initially looked at implementing it:

The kernel does not send out a initial global opts message when a new vlan group is created (the first time a vlan is assigned to any bridge member), only if any global options are changed it sends out one. So you won't know that a vlan is assigned MSTI 0 unless you explicitly ask, or change any other option.

Likewise, the kernel does not send out a RTM_DELVLAN for global opts if a vlan group is removed (the last member of a vlan is removed). This resets all vlan global opts, but user space won't know unless userspace does reference tracking. Or tries to get the options for the vlan, which will then result in an error.

And lastly, kernel does not send out any messages to userspace about STP state changes for MSTI, or VLAN STP state changes as a result of them.

Maybe I should just send in kernel patches for these instead of trying to come up with a solution for this in userspace ... .

If you're interested, we could set up a branch on a forked repo so we can work together on a more complete module that can be upstreamed all at once.

I can't do any promises on how much time I can spend on it, but apart from that we can do that.

@ronand-atl ronand-atl force-pushed the feat-br-vlan-global-opts branch from f556d1e to e33b8bc Compare October 30, 2024 23:07
@ronand-atl ronand-atl changed the title lib/route: add initial support for br_vlan module (WIP) lib/route: add initial support for br_vlan global_opts module Oct 30, 2024
@ronand-atl
Copy link
Contributor Author

I've pushed a new version of the code which finishes implementing this module. This module should work as is for allowing the user to modify and query global VLAN options of a bridge master interface. The non-global counterpart, if there is interest, would be implemented in a separate file, followed by caching support.

Some more testing will need to be done to ensure everything works as expected before merging.

@ronand-atl ronand-atl force-pushed the feat-br-vlan-global-opts branch from e33b8bc to d9c1289 Compare October 31, 2024 00:23
@KanjiMonster
Copy link
Contributor

I've pushed a new version of the code which finishes implementing this module. This module should work as is for allowing the user to modify and query global VLAN options of a bridge master interface. The non-global counterpart, if there is interest, would be implemented in a separate file, followed by caching support.

Some more testing will need to be done to ensure everything works as expected before merging.

How does cache update reporting work for those merged objects? Will the callback know what was merged, or will it need to do its own diff of the "old" and "new" objects?

The latter would be rather inefficient, and I would like to avoid that. Ideally there is one object per {vlan,port,bridge}.

@ronand-atl
Copy link
Contributor Author

How does cache update reporting work for those merged objects? Will the callback know what was merged, or will it need to do its own diff of the "old" and "new" objects?

I'm not 100% sure what you mean. Could you elaborate? Are you talking about the oo_update function? Currently that's written to just handle the response of a dump request, and not update notifications. To handle update notifications it would need to merge the new entries into the existing object (adding or replacing entries). A two pointer strategy could be used do the operation in O(n + m) time without needing to modify the data structure.

Ideally there is one object per {vlan,port,bridge}.

I currently have one object per bridge master interface which has an entry for each vlan. This applies to the global options, but I'd expect the non global options to also have an object per bridge master or port with an entry per vlan.

@KanjiMonster
Copy link
Contributor

I currently have one object per bridge master interface which has an entry for each vlan. This applies to the global options, but I'd expect the non global options to also have an object per bridge master or port with an entry per vlan.

This is the core issue, you can't have multiple object types in one cache. Either the cache has global vlan objects, or it has vlan entry objects. It cannot have both.

And you cannot have more than one cache per netlink message type. Once you register your global vlan options object type for RTM_{NEW,DEL}VLAN, you cannot add one for vlan entry objects, and vice versa.

My best Idea so far had been to have a "type" attribute for the object, which tells whether it is a global opts or a vlan entry type. So the id attributes become {ifindex,vid,type}, and depending on the attributes in the RTM_{NEW,DEL}VLAN you would update/create/remove different type of objects.

There is no issue in sending multiple object notifications from one message, libnl handles that fine.

Then maybe we also keep a reference of the vid entry in the gopts, as gopts live only as long as there are vlan members, once the last member is removed, the gopts get implicitly reset to defaults (but there is no explicit netlink message for that).

@KanjiMonster
Copy link
Contributor

Though in general as a quick glance review for this PR:

  • I think there should be one gopts object per vlan per bridge. IIUC there are no global gopts that apply to all vlans, but each option is per vlan.
  • You are not registering for any netlink message types or any netlink message groups, so the message parser is never invoked. You are missing nl_object_ops::oo_msgtypes etc.

@ronand-atl
Copy link
Contributor Author

Hmm yeah the caching issue is a tricky one. My initial idea with this PR was to implement the functionality without caching, and then add caching later, once it's figured out. I'm also mindful that the initial design must not do anything that could compromise efforts to add caching later.

I'm thinking some of the caching code could be changed to allow offloading the message handling to an intermediary module that reads each attribute in the message and calls the appropriate handler from either the gopts or the non-gopts module. This would solve this problem you mentioned: "And you cannot have more than one cache per netlink message type". I'll need to have a closer look at the code though to see if this is a viable solution.

In response to your review:

I think there should be one gopts object per vlan per bridge. IIUC there are no global gopts that apply to all vlans, but each option is per vlan.

Perhaps, I'm not sure. Is there a benefit to doing that? Wouldn't that make it more annoying to get and update entries for a single interface since now there's potentially many more entries that aren't grouped or ordered in any way?

You are not registering for any netlink message types or any netlink message groups, so the message parser is never invoked. You are missing nl_object_ops::oo_msgtypes etc.

The message parser does get invoked when requesting the gopts entries for a bridge master with rtnl_br_vlan_gopts_get_kernel. As we don't know about the best way to implement caching, there isn't any code that registers a cache yet.

static inline void nl_list_join(struct nl_list_head *head_1,
struct nl_list_head *head_2)
{
nl_list_insert_list_after(head_2, head_1->prev);
Copy link
Owner

@thom311 thom311 Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this code come from linux kernel headers (and do they call the functionality this way)? If kernel does it, that would be a strong reason to name them the same (and basically implement them the same). I didn't check. The commit message should explain how this new API relates to the list API in kernel (and optimally, they are obviously related and similar).


With these linked lists, often the head is a special object and not a regular entry. That is, often we could not meaningfully call nl_list_entry(head, Node, lst_node). It should be clear what happens if head_1 or head_2 point to such a special head element (or not).

for example, we could define that head_1 can be either a special element or not (it doesn't matter). But that head_2 must point to a special head element which afterwards will be empty (and the content of list 2 moved over to head_1. Alternatively, we could define that head_2 points to a regular element, and the entire list 2 will be joined. That is, list 2 is never empty, it at least contains head_2 element. In the latter case, `head_2 would not be empty afterwards.

This should be defined (by having a code comment that explains how to use this). Optimally, we would also have unit tests, since it's not at all obvious whether this implementation is correct (which it probably is).


The same applies to nl_list_insert_list_after(). Also, don't do nl_list_join and nl_list_insert_list_after something similar? Couldn't nl_list_insert_list_after() be named something like nl_list_join_front() (am not not sure exactly, since I don't fully understand what they do(*))? However, their name is rather different. There should be code comment that explains the difference (and also makes it clear why they are named differently).

(*) yes, I could spend some time trying to understand what they do, and then reason backwards whether the name makes sense. However, the name should make it clear upfront, and optimally a code comment explains the non-obvious details beyond a good name. Then I'd be more inclined to reason whether the implementation actually implements what the name+doc indicates.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the code is copied from linux kernel, then we must take care to not use invalid licenses (libnl is LGPL2.1-only).

Granted, those functions seem so minimal, I don't think you can implement them any other way.

If you want to get inspired, then compare to https://github.com/c-util/c-list/blob/main/src/c-list.h which has a compatible license. The downside is, we really want that the API looks similar to what is in kernel, so that somebody familiar with kernel API understand it.

Use your best judgment :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would merge this patch early, even if no users are merged yet. I think it's generally useful API beyond this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with all your points. I myself found it hard to understand how the list module worked since it had no documentation. It also didn't help that I'd never seen that type of implementation of a linked list before. I'll make a separate PR adds the extra methods and documents the module. I can add some unit tests as well if you could give me some pointers on how to do that.

Does this code come from linux kernel headers

No it doesn't. I hadn't looked at the code in the Linux kernel for their list implementation until now. I wrote the code based on a method I worked out on paper. The closest thing the kernel has seems to be list_splice_tail_init in include/linux/list.h. You seem to be able to use that function to do the same thing. It just has the parameters swapped around I think.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I myself found it hard to understand how the list module worked since it had no documentation.

With API like nl_list_add_tail() it's mostly clear what the function is going to do. On the other hand, with nl_list_join() there are more details that are not obvious.

I can add some unit tests as well if you could give me some pointers on how to do that.

In this case, unit tests would go to tests/check-direct.c. This test statically links the libraries and runs without a separate namespace, so it's suited for testing "pure" internal code.

The closest thing the kernel has [...]. It just has the parameters swapped around I think.

That might be an argument for following kernel names and API. On the other hand, there is the potential licensing concern (??) and maybe c-list.h (below) would still be nicer.

I wrote the code based on a method I worked out on paper.

That's great :)


I am quite a fan of https://github.com/c-util/c-list/blob/main/src/c-list.h

It's 2024 and I wonder, whether we shouldn't re-use an existing, solid implementation. Granted, a circular, intrusive linked list is relatively easy to reimplement. But should we? In NetworkManager, c-list is used via git-subtree. Since it's header-only, it's easy to use.

Import with

git subtree add --prefix shared/c-list/ [email protected]:c-util/c-list.git main --squash

upgrade (if ever necessary) with

git subtree pull --prefix shared/c-list/ [email protected]:c-util/c-list.git main --squash

and then add shared/c-list/src/ to Makefile.am as

default_includes = \
»·······-I$(srcdir)/include/linux-private \

I would do that. WDYT?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked and the library provides a means to do everything we can currently do with our implementation, except it it doesn't have reverse iteration macros. Those would be easy enough to add, but that might mean the code would have to copied over instead of using a subtree (or another file includes that file to add to it).

I think the point would be that we submit such features to c-list "upstream", and just do a git subtree pull. That requires that upstream is susceptible to such patches. I think upstream is, otherwise there would be a problem with pulling it.

Thereby others could potentially benefit to that contribution, not just some obscure part inside libnl3.

Of course it will still take some work to convert the existing code to use the new interface.

I don't think we need to rework existing code (unless somebody wants to send patches). It could be used only for new code.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about #412 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'll also make a PR that adds reverse iterators to that library. I've added the code; I've just got to adapt some of their tests. I'll probably submit that tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the PR I made, for reference: c-util/c-list#14

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

In the meantime, for your WIP branch here, I think you can do:

git subtree pull --prefix third_party/c-list/ [email protected]:c-util/c-list.git refs/pull/14/head --squash

thom311 pushed a commit that referenced this pull request Nov 12, 2024
include/netlink/netlink.h Outdated Show resolved Hide resolved
The following methods are added:
- nl_list_insert_before
- nl_list_insert_after
- nl_list_insert_list_after
- nl_list_join
- nl_list_last_entry
- nl_list_for_each_entry_reverse
- nl_list_for_each_entry_safe_reverse

These methods are added to support an upcoming patch implementing bridge
vlan support which requires additional list operations to be available.
In order to support an upcoming patch implementing support for
requesting bridge vlan information for a single interface, a
modification to the pickup code is needed to allow handling of multipart
messages.

Based on the implementation of the bridge vlan dumping code in the
kernel, information relating to a single interface may not all be
contained within a single message. If the maximum message size is
exceeded before all entries can be filled in, a new message will be
started with the same ifindex but containing the remaining entries.

The code is currently only designed for handling netlink responses where
the provided parser function will be called a maximum of one time. This
is based on the assumption that all data relating to an object will be
contained within a single response message. If multiple messages are
received and the parser function is invoked multiple times, previously
stored objects will be leaked and the resulting object will only contain
information from the last parsed message.

In order to handle the case where multiple messages need to be parsed to
construct a single object, use the already existing oo_update function
that may be defined for an object. The function will be called to merge
partial objects (that are the result of each invocation of the parser
function) together into a single object that is returned at the end.
Add br_vlan global_opts module to support setting global VLAN options on
bridge master interfaces.

There is currently no support for cache operations and the non-global
options counterpart is not implemented.
@ronand-atl ronand-atl force-pushed the feat-br-vlan-global-opts branch from d9c1289 to 66659a1 Compare November 12, 2024 22:32
@mcr
Copy link

mcr commented Nov 13, 2024 via email

@ronand-atl
Copy link
Contributor Author

@mcr Were you referring to this one? https://github.com/freebsd/freebsd-src/blob/main/sys/sys/queue.h
The code doesn't look too similar.

@KanjiMonster
Copy link
Contributor

In response to your review:

I think there should be one gopts object per vlan per bridge. IIUC there are no global gopts that apply to all vlans, but each option is per vlan.

Perhaps, I'm not sure. Is there a benefit to doing that? Wouldn't that make it more annoying to get and update entries for a single interface since now there's potentially many more entries that aren't grouped or ordered in any way?

One benefit is that look ups for specific vlan's options is quicker, since you can make use of the cache (and hashing). And you can make use of libnl filtering (e.g. "give me all vlan opts with msti set to X"). No way to do that if all vlans are part of the same object.

Since these are global options, there will only be one object per bridge, and in most scenarios, there will only be one bridge. So in practice, having all vlans in one object means there is only one object, so the cache becomes meaningless. And if you want to look up any info, you need to iterate through the linked list(s).

So from a consumer perspective having one object per vlan is much easier to work with.

@ronand-atl
Copy link
Contributor Author

@KanjiMonster I see. I'll see if I can come up with a revised interface for the module then. Hopefully most of my code is reusable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants