Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux: add lag support and bonding plugin for non-lag cases #1624

Merged
merged 68 commits into from
Jan 1, 2025

Conversation

jbemmel
Copy link
Collaborator

@jbemmel jbemmel commented Dec 8, 2024

  • Refactor and reuse FRR logic for bond creation on Linux, both for the lag module and for the bonding plugin
  • Support all the Linux bonding modes, split into:
    • lag module: [ 802.3ad, balance-rr, balance-xor, broadcast ] - these require switch support
    • bonding plugin: [ active-backup, balance-tlb, balance-alb ]

In all cases, bonding devices are created in initial such that any IP addresses can be assigned. Bond/lag members are then added in the respective lag module and bonding plugin

Executes in netns for containers -> no need to install iproute2

Tested:

  • netlab up integration/lag/05-bonding-active-standby.yml -p clab
  • netlab up integration/lag/05-bonding-active-standby.yml -p libvirt
  • netlab up integration/lag/06-host-mlag-anycast-gateway.yml -p clab -d dellos10

Copy link
Owner

@ipspace ipspace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor details plus a major problem: we don't get the Linux static routes for the lab pools (they are supposed to point to the first data-plane interface), so the Linux host cannot reach anything but the adjacent router.

Fix the nits and I'll add the static routes part.

netsim/devices/linux.yml Outdated Show resolved Hide resolved
netsim/ansible/templates/lag/linux.j2 Outdated Show resolved Hide resolved
@ipspace
Copy link
Owner

ipspace commented Dec 9, 2024

Also, just wondering: how does netplan-based Ubuntu configuration work with handcrafted bond devices?

@jbemmel jbemmel marked this pull request as draft December 9, 2024 14:25
@jbemmel
Copy link
Collaborator Author

jbemmel commented Dec 9, 2024

Also, just wondering: how does netplan-based Ubuntu configuration work with handcrafted bond devices?

It shows them as "unmanaged" just like the 'lo' device (odd to see that), but I agree it would be nicer to craft netplan configs where available. Will adjust

@jbemmel jbemmel force-pushed the linux_add_lag_support branch from 33e1ee4 to d20c7ef Compare December 10, 2024 17:36
@ipspace
Copy link
Owner

ipspace commented Dec 11, 2024

Fix the nits and I'll add the static routes part.

Static route issue is tracked in #1641. I will probably go for a simple fix in the configuration template (that code is ancient)

@jbemmel jbemmel force-pushed the linux_add_lag_support branch from a884b6f to 1cb4cea Compare December 15, 2024 19:26
@jbemmel jbemmel changed the title Linux: add lag support Linux: add lag support and bonding plugin for non-lag cases Dec 15, 2024
@jbemmel jbemmel marked this pull request as ready for review December 16, 2024 03:44
@jbemmel
Copy link
Collaborator Author

jbemmel commented Dec 16, 2024

Ready to add the static route bits

Copy link
Owner

@ipspace ipspace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of confused comments and a few minor changes. Also, it looks like it's time for a rebase.

docs/caveats.md Outdated Show resolved Hide resolved
docs/plugins/bonding.md Outdated Show resolved Hide resolved
docs/plugins/bonding.md Show resolved Hide resolved
docs/plugins/bonding.md Outdated Show resolved Hide resolved
netsim/augment/links.py Outdated Show resolved Hide resolved
netsim/extra/bonding/plugin.py Show resolved Hide resolved
netsim/extra/bonding/plugin.py Outdated Show resolved Hide resolved
netsim/extra/bonding/plugin.py Outdated Show resolved Hide resolved
netsim/extra/bonding/plugin.py Outdated Show resolved Hide resolved
@@ -12,19 +12,21 @@ attributes:
global:
lacp: { type: str, valid_values: [ "off", "slow", "fast" ] }
lacp_mode: { type: str, valid_values: [ "passive", "active" ] }
mode: { type: str, valid_values: [ "802.3ad", "balance-xor", "active-backup" ] }

# All Linux bonding modes that require lag configuration on the switch side
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only bonding mode that requires switch configuration is "802.3ad" or am I missing something? Why exactly do you think we need Linux bonding modes in the generic LAG module?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/html/networking_guide/overview-of-bonding-modes-and-the-required-settings-on-the-switch#overview-of-bonding-modes-and-the-required-settings-on-the-switch

Table 7.1. Switch Configuration Settings Depending on the Bonding Modes
Bonding Mode	Configuration on the Switch
0 - balance-rr	Requires static Etherchannel enabled (not LACP-negotiated)
1 - active-backup	Requires autonomous ports
2 - balance-xor	Requires static Etherchannel enabled (not LACP-negotiated)
3 - broadcast	Requires static Etherchannel enabled (not LACP-negotiated)
4 - 802.3ad	Requires LACP-negotiated Etherchannel enabled
5 - balance-tlb	Requires autonomous ports
6 - balance-alb	Requires autonomous ports

So 0, 2, 3 and 4 require lag configuration on the switch side, with only 4 including LACP.

1, 5 and 6 are handled by the bonding plugin (server-side config only)

@ipspace ipspace marked this pull request as draft December 23, 2024 06:16
@jbemmel jbemmel force-pushed the linux_add_lag_support branch from e4439bf to def1a35 Compare December 23, 2024 17:18
@jbemmel
Copy link
Collaborator Author

jbemmel commented Dec 23, 2024

One issue: We need to execute the ip command on the host, from within the container namespace context. This happens for initial during bond device creation, but the bond members are added during lag module execution which (currently) happens inside the container (hence depends on ip etc.)

I'm thinking we need a mechanism to control the execution context of provisioning scripts in case of Linux containers: Either on the host (for initial and lag modules), or "normal" as is done for dhcp. Perhaps a convention in the filename: e.g. linux-host.j2 for network namespace execution, and regular linux.j2 for current practice

Thoughts?

@jbemmel jbemmel force-pushed the linux_add_lag_support branch from e7bb543 to 130ad54 Compare December 25, 2024 13:54
@jbemmel jbemmel marked this pull request as ready for review December 25, 2024 17:23
@jbemmel jbemmel force-pushed the linux_add_lag_support branch from 8c43217 to 4c5f63a Compare December 30, 2024 12:37
@ipspace ipspace marked this pull request as draft December 31, 2024 08:35
@ipspace
Copy link
Owner

ipspace commented Dec 31, 2024

Looks like you're still working on this one. When you're done, please mark it as "ready for review"

@jbemmel jbemmel marked this pull request as ready for review December 31, 2024 13:04
@jbemmel
Copy link
Collaborator Author

jbemmel commented Dec 31, 2024

Looks like you're still working on this one. When you're done, please mark it as "ready for review"

It's ready, all I did was rebase it. I'm waiting on this one before merging #1646

@ipspace ipspace merged commit 00a988b into ipspace:dev Jan 1, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants