-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib/route: add initial support for br_vlan global_opts module #407
base: main
Are you sure you want to change the base?
Conversation
Sorry, I cannot competently comment on this question. It requires to well understand the kernel API, that is, how are those things exposed on the netlink API, so we can find which objects can best represent them. Overall, the patch seems to be on the right track (I think!). Thank you. |
FWIW, we have implemented the VLANDB part at https://github.com/bisdn/meta-bisdn-linux/blob/main/recipes-support/libnl/libnl/0008-bridge-vlan-add-per-vlan-stp-state-object-and-cache.patch, but I havent submitted that yet due to the global options part not being implemented. Maybe we can find a way of combining the effort. |
This is a similar "issue" here that I also mentioned at #390 in that the same netlink group has two very different kind of objects it notifies about. So for VLANDB there is the global options which exist once per vlan per bridge which has MSTI assignment and various options set for the whole vlan, and then there is are per-bridge member per-vlan options, like the VID, whether it is tagged, pvid, STP state, stats etc. So in a way it does not make much sense to have both share a cache, since they have no overlap in information. |
@KanjiMonster Hey, thanks for reaching out :)
Nice, luckily we didn't end up duplicating effort. I had a quick look over the patch linked and I noticed the
The global options seem to be conceptually different from the per vlan per port options, so I'm planning on having them in their own module If you're interested, we could set up a branch on a forked repo so we can work together on a more complete module that can be upstreamed all at once. |
Right, we are only listening to netlink, not setting anything, so we only cared about getting notified about changes part (and being able to look stuff up). For context, we are listing to changes in fdb/bridge/routes/etc to write the state/configuration down to a switch ASIC so these things get hardware accelerated in the background. A bit like switchdev, but in userspace.
But caching is the interesting part in libnl ;-). And the complicated one, since IIRC libnl only allows one module per message type, so we can't have two modules for RTM_NEWVLAN. So we would need to change libnl to allow that, but at the same time it would also benefit the mdb implementation. And I just remembered my "programming is hard, let's go shopping" moment when I initially looked at implementing it: The kernel does not send out a initial global opts message when a new vlan group is created (the first time a vlan is assigned to any bridge member), only if any global options are changed it sends out one. So you won't know that a vlan is assigned MSTI 0 unless you explicitly ask, or change any other option. Likewise, the kernel does not send out a RTM_DELVLAN for global opts if a vlan group is removed (the last member of a vlan is removed). This resets all vlan global opts, but user space won't know unless userspace does reference tracking. Or tries to get the options for the vlan, which will then result in an error. And lastly, kernel does not send out any messages to userspace about STP state changes for MSTI, or VLAN STP state changes as a result of them. Maybe I should just send in kernel patches for these instead of trying to come up with a solution for this in userspace ... .
I can't do any promises on how much time I can spend on it, but apart from that we can do that. |
f556d1e
to
e33b8bc
Compare
I've pushed a new version of the code which finishes implementing this module. This module should work as is for allowing the user to modify and query global VLAN options of a bridge master interface. The non-global counterpart, if there is interest, would be implemented in a separate file, followed by caching support. Some more testing will need to be done to ensure everything works as expected before merging. |
e33b8bc
to
d9c1289
Compare
How does cache update reporting work for those merged objects? Will the callback know what was merged, or will it need to do its own diff of the "old" and "new" objects? The latter would be rather inefficient, and I would like to avoid that. Ideally there is one object per {vlan,port,bridge}. |
I'm not 100% sure what you mean. Could you elaborate? Are you talking about the oo_update function? Currently that's written to just handle the response of a dump request, and not update notifications. To handle update notifications it would need to merge the new entries into the existing object (adding or replacing entries). A two pointer strategy could be used do the operation in O(n + m) time without needing to modify the data structure.
I currently have one object per bridge master interface which has an entry for each vlan. This applies to the global options, but I'd expect the non global options to also have an object per bridge master or port with an entry per vlan. |
This is the core issue, you can't have multiple object types in one cache. Either the cache has global vlan objects, or it has vlan entry objects. It cannot have both. And you cannot have more than one cache per netlink message type. Once you register your global vlan options object type for My best Idea so far had been to have a "type" attribute for the object, which tells whether it is a global opts or a vlan entry type. So the id attributes become {ifindex,vid,type}, and depending on the attributes in the RTM_{NEW,DEL}VLAN you would update/create/remove different type of objects. There is no issue in sending multiple object notifications from one message, libnl handles that fine. Then maybe we also keep a reference of the vid entry in the gopts, as gopts live only as long as there are vlan members, once the last member is removed, the gopts get implicitly reset to defaults (but there is no explicit netlink message for that). |
Though in general as a quick glance review for this PR:
|
Hmm yeah the caching issue is a tricky one. My initial idea with this PR was to implement the functionality without caching, and then add caching later, once it's figured out. I'm also mindful that the initial design must not do anything that could compromise efforts to add caching later. I'm thinking some of the caching code could be changed to allow offloading the message handling to an intermediary module that reads each attribute in the message and calls the appropriate handler from either the gopts or the non-gopts module. This would solve this problem you mentioned: "And you cannot have more than one cache per netlink message type". I'll need to have a closer look at the code though to see if this is a viable solution. In response to your review:
Perhaps, I'm not sure. Is there a benefit to doing that? Wouldn't that make it more annoying to get and update entries for a single interface since now there's potentially many more entries that aren't grouped or ordered in any way?
The message parser does get invoked when requesting the gopts entries for a bridge master with |
static inline void nl_list_join(struct nl_list_head *head_1, | ||
struct nl_list_head *head_2) | ||
{ | ||
nl_list_insert_list_after(head_2, head_1->prev); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this code come from linux kernel headers (and do they call the functionality this way)? If kernel does it, that would be a strong reason to name them the same (and basically implement them the same). I didn't check. The commit message should explain how this new API relates to the list API in kernel (and optimally, they are obviously related and similar).
With these linked lists, often the head is a special object and not a regular entry. That is, often we could not meaningfully call nl_list_entry(head, Node, lst_node)
. It should be clear what happens if head_1
or head_2
point to such a special head element (or not).
for example, we could define that head_1
can be either a special element or not (it doesn't matter). But that head_2
must point to a special head element which afterwards will be empty (and the content of list 2 moved over to head_1
. Alternatively, we could define that head_2
points to a regular element, and the entire list 2 will be joined. That is, list 2 is never empty, it at least contains head_2 element. In the latter case, `head_2 would not be empty afterwards.
This should be defined (by having a code comment that explains how to use this). Optimally, we would also have unit tests, since it's not at all obvious whether this implementation is correct (which it probably is).
The same applies to nl_list_insert_list_after()
. Also, don't do nl_list_join
and nl_list_insert_list_after
something similar? Couldn't nl_list_insert_list_after()
be named something like nl_list_join_front()
(am not not sure exactly, since I don't fully understand what they do(*))? However, their name is rather different. There should be code comment that explains the difference (and also makes it clear why they are named differently).
(*) yes, I could spend some time trying to understand what they do, and then reason backwards whether the name makes sense. However, the name should make it clear upfront, and optimally a code comment explains the non-obvious details beyond a good name. Then I'd be more inclined to reason whether the implementation actually implements what the name+doc indicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the code is copied from linux kernel, then we must take care to not use invalid licenses (libnl is LGPL2.1-only).
Granted, those functions seem so minimal, I don't think you can implement them any other way.
If you want to get inspired, then compare to https://github.com/c-util/c-list/blob/main/src/c-list.h which has a compatible license. The downside is, we really want that the API looks similar to what is in kernel, so that somebody familiar with kernel API understand it.
Use your best judgment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would merge this patch early, even if no users are merged yet. I think it's generally useful API beyond this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with all your points. I myself found it hard to understand how the list module worked since it had no documentation. It also didn't help that I'd never seen that type of implementation of a linked list before. I'll make a separate PR adds the extra methods and documents the module. I can add some unit tests as well if you could give me some pointers on how to do that.
Does this code come from linux kernel headers
No it doesn't. I hadn't looked at the code in the Linux kernel for their list implementation until now. I wrote the code based on a method I worked out on paper. The closest thing the kernel has seems to be list_splice_tail_init
in include/linux/list.h
. You seem to be able to use that function to do the same thing. It just has the parameters swapped around I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I myself found it hard to understand how the list module worked since it had no documentation.
With API like nl_list_add_tail()
it's mostly clear what the function is going to do. On the other hand, with nl_list_join()
there are more details that are not obvious.
I can add some unit tests as well if you could give me some pointers on how to do that.
In this case, unit tests would go to tests/check-direct.c
. This test statically links the libraries and runs without a separate namespace, so it's suited for testing "pure" internal code.
The closest thing the kernel has [...]. It just has the parameters swapped around I think.
That might be an argument for following kernel names and API. On the other hand, there is the potential licensing concern (??) and maybe c-list.h
(below) would still be nicer.
I wrote the code based on a method I worked out on paper.
That's great :)
I am quite a fan of https://github.com/c-util/c-list/blob/main/src/c-list.h
It's 2024 and I wonder, whether we shouldn't re-use an existing, solid implementation. Granted, a circular, intrusive linked list is relatively easy to reimplement. But should we? In NetworkManager, c-list is used via git-subtree
. Since it's header-only, it's easy to use.
Import with
git subtree add --prefix shared/c-list/ [email protected]:c-util/c-list.git main --squash
upgrade (if ever necessary) with
git subtree pull --prefix shared/c-list/ [email protected]:c-util/c-list.git main --squash
and then add shared/c-list/src/
to Makefile.am
as
default_includes = \
»·······-I$(srcdir)/include/linux-private \
I would do that. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked and the library provides a means to do everything we can currently do with our implementation, except it it doesn't have reverse iteration macros. Those would be easy enough to add, but that might mean the code would have to copied over instead of using a subtree (or another file includes that file to add to it).
I think the point would be that we submit such features to c-list "upstream", and just do a git subtree pull
. That requires that upstream is susceptible to such patches. I think upstream is, otherwise there would be a problem with pulling it.
Thereby others could potentially benefit to that contribution, not just some obscure part inside libnl3.
Of course it will still take some work to convert the existing code to use the new interface.
I don't think we need to rework existing code (unless somebody wants to send patches). It could be used only for new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about #412 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I'll also make a PR that adds reverse iterators to that library. I've added the code; I've just got to adapt some of their tests. I'll probably submit that tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the PR I made, for reference: c-util/c-list#14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
In the meantime, for your WIP branch here, I think you can do:
git subtree pull --prefix third_party/c-list/ [email protected]:c-util/c-list.git refs/pull/14/head --squash
[[email protected]: Taken from [1].] [1] #407
The following methods are added: - nl_list_insert_before - nl_list_insert_after - nl_list_insert_list_after - nl_list_join - nl_list_last_entry - nl_list_for_each_entry_reverse - nl_list_for_each_entry_safe_reverse These methods are added to support an upcoming patch implementing bridge vlan support which requires additional list operations to be available.
In order to support an upcoming patch implementing support for requesting bridge vlan information for a single interface, a modification to the pickup code is needed to allow handling of multipart messages. Based on the implementation of the bridge vlan dumping code in the kernel, information relating to a single interface may not all be contained within a single message. If the maximum message size is exceeded before all entries can be filled in, a new message will be started with the same ifindex but containing the remaining entries. The code is currently only designed for handling netlink responses where the provided parser function will be called a maximum of one time. This is based on the assumption that all data relating to an object will be contained within a single response message. If multiple messages are received and the parser function is invoked multiple times, previously stored objects will be leaked and the resulting object will only contain information from the last parsed message. In order to handle the case where multiple messages need to be parsed to construct a single object, use the already existing oo_update function that may be defined for an object. The function will be called to merge partial objects (that are the result of each invocation of the parser function) together into a single object that is returned at the end.
Add br_vlan global_opts module to support setting global VLAN options on bridge master interfaces. There is currently no support for cache operations and the non-global options counterpart is not implemented.
d9c1289
to
66659a1
Compare
looks like a clone of BSD sys/queue.h
|
@mcr Were you referring to this one? https://github.com/freebsd/freebsd-src/blob/main/sys/sys/queue.h |
One benefit is that look ups for specific vlan's options is quicker, since you can make use of the cache (and hashing). And you can make use of libnl filtering (e.g. "give me all vlan opts with msti set to X"). No way to do that if all vlans are part of the same object. Since these are global options, there will only be one object per bridge, and in most scenarios, there will only be one bridge. So in practice, having all vlans in one object means there is only one object, so the cache becomes meaningless. And if you want to look up any info, you need to iterate through the linked list(s). So from a consumer perspective having one object per vlan is much easier to work with. |
@KanjiMonster I see. I'll see if I can come up with a revised interface for the module then. Hopefully most of my code is reusable. |
I am working on adding initial support for a new bridge vlan module for libnl in order to support
RTM_{NEW,DEL,GET}VLAN
messages. Our initial need is to be able to set bridge vlan global options, which are sent with theBRIDGE_VLANDB_GLOBAL_OPTIONS
nested attribute.I would like to design the initial support such that extending the module to support
BRIDGE_VLANDB_ENTRY
,GET
andDEL
requests, and caching is possible.I am looking for some initial feedback on the best way to approach this from someone who is more familiar designing new modules. One thing I am unsure about is what level the libnl object for the bridge vlan should be defined at. Currently I have it so that there is one object for all bridge vlan info, but maybe there should be one just for the global options and one for the entries. Both share the same message type so I'm unsure if that might introduce challenges in implementing caching.