Skip to content
This repository has been archived by the owner on Mar 31, 2023. It is now read-only.

Adding a node with previous Weave Net setup can cause failures #285

Open
bboreham opened this issue Jul 30, 2020 · 9 comments
Open

Adding a node with previous Weave Net setup can cause failures #285

bboreham opened this issue Jul 30, 2020 · 9 comments
Labels
chore Related to fix/refinement/improvement of end user or new/existing developer functionality

Comments

@bboreham
Copy link
Contributor

Weave Net uses a Linux bridge device, which will get an IP address assigned from the pod IP range.
If you do something like remove a node from one cluster and add it to another, the bridge may retain an IP address, and that address could now duplicate the IP of a pod or another bridge.

This will cause weird failures as arp resolves the IP to one or other device arbitrarily.

Maybe we could have a command to clear down Weave Net on the node at install time, a bit like kubeadm reset ?
See also weaveworks/weave#2911

@bboreham bboreham added the chore Related to fix/refinement/improvement of end user or new/existing developer functionality label Jul 30, 2020
@chanwit
Copy link
Member

chanwit commented Jul 30, 2020

Read weaveworks/weave#2911 and saw the weave reset command.

@bboreham
Copy link
Contributor Author

Note that weave reset only works with Docker. Currently wksctl is installing Docker, but should probably move to containerd or something else. It would be nice not to add a new dependency when trying to solve this problem.

In other words, we should create something similar to weave reset but with smaller footprint more tightly aimed at the Weave-Net-on-Kubernetes case.

@chanwit
Copy link
Member

chanwit commented Jul 30, 2020

Got it. Thank you, Bryan.

Reading the weave reset logic that we should replicate.

  1. Remove weave container (might be specific to Docker?)
  2. rm -f $HOST_ROOT/var/lib/weave/weave-netdata.db
  3. rm -f $HOST_ROOT/var/lib/weave/weavedata.db
  4. destroy bridge
  5. ip link del for all interface name = v${CONTAINER_NAME}pl

@chanwit
Copy link
Member

chanwit commented Jul 30, 2020

In other words, we should create something similar to weave reset but with smaller footprint more tightly aimed at the Weave-Net-on-Kubernetes case.

What are you expecting to see here? I'm guessing it's a new Kubernetes specific command like, weave kube-reset?

@chanwit
Copy link
Member

chanwit commented Jul 30, 2020

I'll call this command tentatively weave kube-reset.
What I should do is to add this command before wksctl installing weave-net addon.

@chanwit
Copy link
Member

chanwit commented Jul 30, 2020

How to validate if this command is going to work correctly?
Checking that the weave bridge get deleted?

@bboreham
Copy link
Contributor Author

bboreham commented Jul 30, 2020

What I should do is to add this command before wksctl installing weave-net addon.

Note the addon is installed once for the cluster, whereas we want to do this 'reset' action for every node, even if the cluster has been running for a month.

Remove weave container (might be specific to Docker?)

Yes, exactly, any 'weave container' would be managed by Kubernetes.

rm -f $HOST_ROOT/var/lib/weave/weave-netdata.db
rm -f $HOST_ROOT/var/lib/weave/weavedata.db

I think one of these is ancient.

destroy bridge

And other devices - datapath etc.

ip link del for all interface name = v${CONTAINER_NAME}pl

I guess if the code is there already. They should disappear when the owning containers disappear.

I would probably also expect it to delete the CNI config and binaries.

Maybe remove iptables rules?

@chanwit
Copy link
Member

chanwit commented Aug 2, 2020

Roughly, I'm finding that the following codes might work:

    kube-reset)
        rm -f $HOST_ROOT/var/lib/weave/weave-netdata.db >/dev/null 2>&1 || true
        rm -f $HOST_ROOT/var/lib/weave/weavedata.db     >/dev/null 2>&1 || true
        destroy_bridge
        for LOCAL_IFNAME in $(ip link show | grep v${CONTAINER_IFNAME}pl | cut -d ' ' -f 2 | tr -d ':') ; do
            ip link del ${LOCAL_IFNAME%@*} >/dev/null 2>&1 || true
        done
        # require ALL_CIDRS
        collect_cidr_args "$@"
        shift $CIDR_ARG_COUNT
        for CIDR in $ALL_CIDRS ; do
            if ip addr show dev $BRIDGE | grep -qF $CIDR ; then
                ip addr del dev $BRIDGE $CIDR
                delete_iptables_rule nat WEAVE -d $CIDR ! -s $CIDR -j MASQUERADE
                delete_iptables_rule nat WEAVE -s $CIDR ! -d $CIDR -j MASQUERADE
                delete_iptables_rule filter WEAVE-EXPOSE -d $CIDR -j ACCEPT
            fi
        done
        ;;

This new weave kube-reset command might contain delete_iptables_rule for WEAVE.
wdyt?

The current problem is that I need to obtain the CIDR used by the current installation. Not sure what's the best way to obtain that CIDR - without calling weave as we cannot expect weave-net binary to be running.

@chanwit
Copy link
Member

chanwit commented Aug 2, 2020

The above codes still need tweaking as I still don't totally understand all variables there. Some might be specific to Docker, for example ${CONTAINER_NAME}.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
chore Related to fix/refinement/improvement of end user or new/existing developer functionality
Projects
None yet
Development

No branches or pull requests

2 participants