-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Babeld without VLAN on ethernet interfaces inside br-lan, replaces #600 #631
base: master
Are you sure you want to change the base?
Conversation
local uci = config.get_uci_cursor() | ||
|
||
if(vlanId ~= 0 and ifname:match("^eth")) then | ||
if tonumber(vlanId) ~= 0 and ifname:match("^eth") then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm.... can vlanId be a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a very bad idea to set it as a string, but if an user sets it as a string, network.lua at this line
xpcall(function() proto.configure(args) ; proto.setup_interface(device, args) end, |
passes the string to the proto, and no protocol force it to be a number. e.g. babeld just takes it
lime-packages/packages/lime-proto-babeld/files/usr/lib/lua/lime/proto/babeld.lua
Line 102 in d81c12c
local vlanId = args[2] or 17 |
lime-config returns no error if a string is used and results in a nil value here
vid = tonumber(vid) |
and interfaces with appended a "_nil" string here
linux802adIfName = linux802adIfName:gsub("[^%w-]", "-")..network.protoVlanSeparator..vlanId |
I'm going to prepare a PR about this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not only this, it is always a string now as how it is parsed from the config. So yes, the conversion to numeric is always needed... (the problem is that we dont have a single test for this for example with vlanid=0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests (I can help!)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment in discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
babeld_over_batman
option should not be removed, is what permit us to eventually setup and test babel only on frontier nodes, as per the original libremesh scalability proposal
@G10h4ck ah... Rly? This scenario is the number 1 feature in LibreMesh and I did not understand how it should work until now. More documentation is needed (this is not explained neither in the How it works). I will add that option back as soon as possible.
if ifname:match("^wlan") and tonumber(vlanId) == 0 then | ||
uci:set("network", owrtInterfaceName, "ifname", "@"..owrtDeviceName) | ||
else | ||
uci:set("network", owrtInterfaceName, "ifname", linuxVlanIfName) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spiccinini @G10h4ck @nicopace @gmarcos87
Please review this, I think is the most likely-to-be-suboptimal part of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not a new issue, we have workarounds for similar thing in other places of the code, it seems that when one uses linuxVlanIfName
this trigger a race condition inside netifd due to some unsolved upstream bug, and that causes unreliable behaviour, if with this workaround it works well in all cases (I assume you tested it a lot) I am ok to merge it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did test it on 5 different routers (listed in the first comment in the PR), but more testing would also be nice to have :)
Can you point me to the other places in the code where such a workaround is used (so that I can compare solutions)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
babeld_over_batman
option should not be removed, is what permit us to eventually setup and test babel only on frontier nodes, as per the original libremesh scalability proposal@G10h4ck ah... Rly? This scenario is the number 1 feature in LibreMesh and I did not understand how it should work until now. More documentation is needed (this is not explained neither in the How it works). I will add that option back as soon as possible.
To implement libremesh full proposal is not a simple task, so we started from the easier part, which is solvable more or less just by generating a static configuration when lime-config is run. The other parts of the proposal (for example automatically turn on L3 routing protocol only on frontier nodes) needs more thins going on at run-time, and we are implementing those things bit by bit (shared-state is one of those things). In the meanwhile even if we don't have all those pieces in place is important to keep the pieces we already have compatible with that scenario, so when we get there it is not too overwhelming to make it work.
I just added back the babeld_over_batman option, I just took the code for it from #600 so it should work (not tested now, but I tested some times ago in the previous PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spiccinini could you test this on LibreRouter? Two scenarios should be tested: with VLAN and without VLAN.
To verify that it works: issue echo dump | nc ::1 30003
and check if all the included interfaces (apart from the WAN one) have an IPv4 in both VLAN and noVLAN configurations. In the noVLAN configuration the br-lan interface has to be present in the list of the Babeld interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I used the code from this PR (rebased to master). Maybe I did a mistake. I deleted both /etc/config wireless and network, regenerated them with generate_config, then lime-config and lime-apply,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not familiar with config_generate and I never trusted lime-apply...
Maybe #75 is related?
@spiccinini could you try without deleting the config files, and just running lime-config and rebooting? Just for being sure...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I don't delete the config the vlan 17 still exists after lime-config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow... Even after the reboot? So something is failing baaaad.
Can you post the full /etc/config/network with in the two cases of VLAN set in /etc/config/lime
and VLAN set to zero in /etc/config/lime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did more testing and for me it works. Can you confirm that it fails with LibreRouter or with your compilation procedure?
Regarding the tests, by myself I am not able to write them, can you do it or guide me?
To implement libremesh full proposal is not a simple task, so we started from the easier part, which is solvable more or less just by generating a static configuration when lime-config is run. The other parts of the proposal (for example automatically turn on L3 routing protocol only on frontier nodes) needs more thins going on at run-time, and we are implementing those things bit by bit (shared-state is one of those things). In the meanwhile even if we don't have all those pieces in place is important to keep the pieces we already have compatible with that scenario, so when we get there it is not too overwhelming to make it work. |
if ifname:match("^wlan") then | ||
--! currently (2019-10-12) mode-ap and mode-apname have an hardcoded | ||
--! "option network lan" so they are always in the br-lan bridge | ||
if ifname:match("^wlan.*ap$") or ifname:match("^wlan.*apname$") then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the first part of this function there is the following code:
if not args["specific"] and ifname:match("^wlan%d+.ap") then
utils.log("lime.proto.babeld.setup_interface(...)", ifname, "ignored")
return
end
As far as I understand for ap and apname it won't reach this part of the code. Is it ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok maybe I am not understanding. In which conditions the program will get into the if of line 137?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean this if, right?
lime-packages/packages/lime-proto-babeld/files/usr/lib/lua/lime/proto/babeld.lua
Line 137 in a8369d9
if ifname:match("^wlan.*ap$") or ifname:match("^wlan.*apname$") then |
It gets hit when an AP or an APname is configured using interface-specific configuration (some documentation on the website here) and babeld is also selected in the interface-specific configuration.
An example of a case where configuring an AP with interface-specific configuration is here in #262 (5 GHz radio used just for mesh and 2.4 GHz radio used for AP+APname+mesh, which cannot currently be done with general configuration).
Something like: (the non-specified options in the wifi sections are taken from the generic wifi configuration, see db1c350. I don't know if more options should be specified for the net sections?)
config wifi radio0 # assuming that radio0 is 2.4 GHz
list modes 'ap'
list modes 'apname'
list modes 'ieee80211s'
config net wireless2ap
option linux_name 'wlan0-ap'
list protocols lan # this is hardcoded anyway in ap proto, adding also here just for clarity
list protocols babeld:17
config net wireless2apname
option linux_name 'wlan0-apname'
list protocols lan # this is hardcoded anyway in apname proto, adding also here just for clarity
list protocols babeld:17
config net wireless2mesh
option linux_name 'wlan0-mesh'
list protocols babeld:17
config wifi radio1 # assuming that radio1 is 5 GHz
list modes 'ieee80211s'
config net wireless5mesh
option linux_name 'wlan1-mesh'
list protocols babeld:17
(beware, I did not test this example)
Reminder for me: I have to check if the absurd workaround for the "known bug" (see #631 (comment) ) is still needed also on OpenWrt 19.07. @spiccinini did you do more testing on LibreRouter? I cannot understand how it can fail so terribly. |
This is #600 with very few modifications, like the removal of a conditional dependency.
Will fix #596 and is a step towards #581.
I tested on various routers (YouHua WR1200JS, TP-Link WDR3600, Ubiquiti NanoStation LoCo M2, Comtrend AR-5387un, Comtrend AR-5315u) with or without VLAN.
This PR allows to use YouHua WR1200JS also with ethernet connections (there's a drivers bug to be confirmed which maybe affects all the Mediatek routers) when setting VLAN 0 (no VLAN).